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Abstract 

Mutation has traditionally been regarded as an important operator in evolutionary algorithms. In particular, 
there have been many experimental studies which showed the effectiveness of adapting mutation rates for various 
static optimization problems. Given the perceived effectiveness of adaptive and self-adaptive mutation for static 
optimization problems, there have been speculations that adaptive and self-adaptive mutation can benefit dy- 
namic optimization problems even more since adaptation and self-adaptation are capable of following a dynamic 
environment. However, few theoretical results are available in analyzing rigorously evolutionary algorithms for 
dynamic optimization problems. It is unclear when adaptive and self-adaptive mutation rates are likely to be 
useful for evolutionary algorithms in solving dynamic optimization problems. This paper provides the first rigor- 
ous analysis of adaptive mutation and its impact on the computation times of evolutionary algorithms in solving 
certain dynamic optimization problems. More specifically, for both individual-based and population-based EAs, 
we have shown that any time-variable mutation rate scheme will not significantly outperform a fixed mutation 
rate on some dynamic optimization problem instances. The proofs also offer some insights into conditions under 
which any time- variable mutation scheme is unlikely to be useful and into the relationships between the problem 
characteristics and algorithmic features (e.g., different mutation schemes). 

Keywords. Evolutionary algorithm, Mutation rate, Adaptation, Dynamic optimization. 

1 Introduction 

Evolutionary Algorithms (EAs) are stochastic search algorithms, which have been used to solve many optimization 
problems in real- world applications [331 EH HO]- As one of the primary operators in the framework of EAs, the 
mutation operator has significant influence on the performance of an EA. During the past decades, various strategies 
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of controlling the parameters of the mutation operator have been developed to promote the performance of EAs. 
Some of them were concerned with the mutation operators for binary search space (e.g., [3J (201 [2H 12H1 ESI SSI HE])) 
while some were dedicated to continuous search space [TJ [221 [321 [SO]. In this paper, we restrict our investigation to 
the former. 

As the most commonly used mutation operator for binary search space, the so-called bitwise mutation operator 
flips each bit of an individual (solution) with a uniform probability P m , where P m is called the mutation rate. Early 
investigations often employed a fixed mutation rate over the whole optimization process |20) , but further studies have 
revealed that a time-variable mutation rate scheme might be better than a fixed mutation rate. According to Back 
[5], Hinterding et al. |27| and Thierens |48j . there are three classes of time-variable mutation rate schemes: dynamic 
mutation rate schemes, adaptive mutation rate schemes and self-adaptive mutation schemes. During the past decades, 
a large number of empirical investigations have been dedicated to show the advantages of various time-variable 
mutation rate schemes. Holland [2H] proposed a time- variable mutation rate scheme for Genetic Algorithm (GA). 
After fourteen years, Fogarty |21) designed a number of dynamic mutation rate schemes with which the performance 
of GAs was significantly improved on some static optimization problems. Inspired by Evolution Strategies (ES), Back 
[2j proposed a self-adaptive mutation scheme for GAs for the continuous search space. Later he carried out both 
theoretical analysis and empirical study to show that for multimodal static optimization problems there might exist 
some "Optimal Mutation Rate Schedule" that can accelerate the search of EAs [2j ■ Srinivas and Patnaik proposed a 
GA with adaptive mutation rate scheme and adaptive crossover scheme (Adaptive Genetic Algorithm, AGA) in [33] . 
They empirically showed that AGA outperformed Simple GA (SGA) on a number of static benchmark optimization 
problems. Smith and Fogarty [43 carried out empirical comparisons between an EA with self-adaptive mutation 
rate scheme and a number of EAs with different fixed mutation rates, and they found that the former outperforms 
the latter on a number of static optimization problems. Thierens |48j designed two adaptive mutation rate schemes, 
whose advantages over a fixed mutation rate were demonstrated by experimental studies. 

In comparison with empirical studies, the theoretical investigations on time-variable mutation rate schemes are 
much fewer. Droste, Jansen and Wegener [HI [29] analyzed a (1 + 1) EA with a dynamic mutation rate scheme, 
and they proved that the expected number of generations spent by this (1 + 1) EA on the so-called PathToJump 
problem is 0(n 2 \ogn). In contrast, with a probability that converges to 1 (with respect to the problem size n), 
the (1 + 1) EA with the fixed mutation rate 1/n will spend a super-polynomial number of generations to optimize 
the above PathToJump problem. On the basis of [18] and [29], Jansen and Wegener [30] carried out further 
theoretical investigations for the above (1 + 1) EA (with a dynamic mutation rate), and they presented a number of 
time complexity results on some well-known benchmark problems, such as OneMax [3] H21 H] and LeadingOnes 
[121 141] . The investigations in [30] show us that the specific dynamic mutation rate scheme outperforms (in terms of 
the runtime of (1 + 1) EA) the fixed mutation rate 1/n significantly on PathToJump, while is slightly inferior on 
OneMax and LeadingOnes. As we might expect, there have also been a few studies demonstrating the potential 
weakness of some specific time- variable mutation rate schemes (e.g., ED]), but it is a drop in the bucket compared 
with the positive results. Besides, these negative results only showed that there is no time-variable mutation rate 
scheme is universally good for an EA over a large variety of problems, which did not exclude the usefulness of time- 
variable mutation rate schemes. In other words, no general result is available so far demonstrating the complete 
failure of the time-variable mutation rate schemes under certain conditions. 

In this paper, we prove for both individual-based and population-based EAs that time-variable mutation rate 
schemes (including any dynamic, adaptive and self-adaptive mutation schemes) cannot significantly outperform a 
well-chosen fixed mutation rate on some Dynamic Optimization Problems (DOPs) [9ll33j. Unlike static optimization 
problems that maintain problem instances, stationary objective functions and constraints consistently, for DOPs, "the 
objective function, the problem instance, or constraints may change over time' 1 '' [33) . In most investigated DOPs, the 
above changes of DOPs will directly result in the movements of global optima [7] [37] [49]. However, such movements 
of different DOPs can follow distinctive manners. To representatively characterize moving global optima in DOPs, we 
employ the so-called BDOP class in our theoretical investigations, which models the movements of global optima as 
random walks of binary strings in the solution space. The BDOP class is a representative DOP model for theoretical 
investigations, which consist of DOPs with different dynamic degrees. Concretely, the dynamic degree of a DOP 
belonging to the BDOP class (named BDOP in the paper) mainly depends on the value of the so-called shifting rate 
er 6 (0, 1/2], which is a parameter controlling the random walks of BDOPs' global optima. A larger a tends to lead 
to larger frequency and size of change for the moving global optimum. 

Intuitively, the simplest way to cope with a DOP is to regard the DOP as a new static optimization problem 
after every change. In this way, a DOP can be solved by traditional EAs or other problem-specific algorithms 
designed for static optimization problems. However, this kind of approaches require vast computational resources 
(e.g., computation time), which is not practical for many real- world DOPs (e.g., dynamic vehicle routing problem 



[10] and online scheduling problem [15]). Under this circumstance, intensive investigations have been carried out 
to understand the behaviors of EAs on DOPs or to design better EAs for DOPs, among which the theoretical 
investigations are often restricted to the (1 + 1) EA. From a theoretical perspective, Rohlfshagen et at studied how 
the frequency and magnitude of the optimum's movement influence the performance of the (1 + 1) EA for dynamic 
optimization [39] , Surprisingly, on two specific problems called Magnitude and Balance, they showed that larger 
frequency and magnitude do not necessarily lead to a worse performance of the EA. Stanhope and Daida [46] studied 
the fitness dynamics of the (1 + 1) EA on the so-called dynamic BitMatching problem that was frequently used in 
previous empirical investigations [171 1491 152] . By a similar approach, Branke and Wang [11] compared the (1 + 1) and 
(1,2) EAs on the dynamic BitMatching problem. Droste [THE] studied the time complexity of the (1 + 1) EA 
with both one-bit mutation and bitwise mutation (with mutation rate 1/n) on the dynamic BitMatching problem. 
These results established the theoretical foundations of the field. However, theoretical investigation that aims at 
revealing the relationship between the adaptation of mutation rate and the performance of EAs on DOPs still lacks, 
though there has been some related empirical studies [35J ES] • 

In this paper, we demonstrate theoretically the relationship between time-variable mutation schemes and the 
performances of both individual-based and population-based EAs on the BDOP class. In our study, we still adopt 
the classical time complexity criterion, the first hitting time [24] [25l [26] . to measure the performances of EAs 
theoretically. Based on the measure, an EA is said to be efficient on a DOP, if and only if the corresponding first 
hitting time is polynomial in the problem size with a probability that is at least the reciprocal of a polynomial function 
of the problem size n. Otherwise, the EA is said to be inefficient (on the DOP). According to the popular perspective 
that adaptation of mutation rate might be helpful, one might expect that there are some time-variable mutation 
scheme that can improve the performances of EAs so that they can cope with BDOPs efficiently. However, in this 
paper, we find that once the asymptotic order of the shifting rate a is larger than log n/n 2 (i.e., a = w(log n/n 2 )), the 
BDOPs will become essentially hard for (1 + 1) EA with any time-variable mutation scheme in which the mutation 
rate at every generation is upper bounded by 1 — 1/ logn; When the asymptotic order of the shifting rate a is larger 
than logn/n (i.e., a = u}(\ogn/n)), the BDOPs will become essentially hard for the (1 + A) EAlJ (A can be any 
polynomial function of n) with any time- variable mutation scheme in which the mutation rate at every generation is 
upper bounded by 1 — 1/logn. Case studies on an instance of the BDOP class called the BitMatchinGd problem 
further demonstrate the limitations of any time-variable mutation scheme via concrete time complexity results on 
both (1 + 1) and (1 + A) EAs, and also show the positive impact of population on the performances of EAs. 

The main contributions and their significance of this paper are summarized as follows: First, our investigation 
provides theoretical evidence for the view that, compared with a fixed mutation rate, adopting some time- variable 
mutation rate scheme is not always a significantly better choice. It is the first time that some general results for any 
time- variable mutation rate scheme are given and proven rigorously. Second, this paper substantially increases the 
understanding of the role of mutation in the context of DOP, and the relationship between time- variable mutation rate 
schemes and time complexities of EAs are investigated theoretically in depth. Third, by comparing individual-based 
and population-based EAs on BDOPs, our investigations revealed theoretically for the first time that population 
may have positive impact on the performances of EAs for solving DOPs. 

The rest of the paper is organized as follows: Section [2] introduces the DOPs and algorithms analyzed in this 
paper. Section [3] studies the impact of time- variable mutation rate schemes on performance of the (1 + 1) EA over 
the BDOP class. Section [4] further studies the impact of time- variable mutation rate schemes on performance of 
the (1 + A) EA over the BDOP class. Section M offers some discussions, interpretations and generalization for the 
theoretical results obtained in this paper. Section [5] concludes the whole paper. 

2 Problem and Algorithm 

In this section, we introduce some preliminaries for this paper, including the notations of asymptotic orders, the 
BDOP class and EAs discussed in this paper. 

2.1 Preliminaries 

To facilitate our analysis, we first introduce some notations that are used in comparing the asymptotic growth order 
of functions. Let g\ = gi(n) and 52 = Qi (n) be two positive functions of n, then [32] : 

• g\ = 0(52), iff 3rio 6 N, c G M + : Vn > n$, gi(n) < cg2(n) (g\ is asymptotically bounded above by g2 up to a 
constant factor, and the asymptotic order of g\ is no larger than that of 52); 

x The (1 + A) EA is a population-based EA which maintains a unique parent individual and generates A offspring individuals at each 
generation. 



• <?i = ^(52), iff 92 = 0{g\) (The asymptotic order of gi is no smaller than that of 32); 

• g± = 0(32), iff gi = 0(52) and <?i = ^(#2) both hold (The asymptotic order of 31 is the same to that of 32); 

• 9x = 0(52), iff limn^oo 9i{ n ) / 92(n) — (91/92 approximates when n — > 00, and the asymptotic order of g\ is 
smaller than that of 32); 

• (71 = ^(32), iff .92 = °(<?i) (The asymptotic order of g\ is larger than that of g-i). 

Moreover, to distinguish the polynomial functions from those super-polynomial functions of n, we utilize the following 
notations: 

. gi -< Poly(n) and ± y both hold iff 3c E R+ : 9l {n) = 0(n c ); 

• g x y SuperPoly(n) and i -« S uperPoi y (n) botn llold iff Vc e R o : 9i( n ) = w(n c ). 

In this paper, the above notations are further utilized in representing the asymptotic orders related to probabilities: 

Definition 1 (Overwhelming Probability and Super-polynomially Small Probability). A probability Pi 
is regarded as "an overwhelming probability" if and only if there exists some super-polynomial function g(n) of 
the problem size n (g(n) y Super Poly(n)) and a positive integer Uq such that Vn > n Q : Pi > 1 — l/g(n) holds. A 
probability P 2 is regarded as "a super-polynomially small probability" (or "a probability that is super-polynomially close 
to 0") if and only if there exists some super-polynomial function g(n) of the problem size n (g(n) y SuperPoly(n) ) 
and a positive integer no such that Vn > tiq : Pi < 1/ g(n) holds. 

In the following parts of the section, we will first describe the general DOP model for our theoretical analysis. 
After that, we will present the concrete DOP analyzed in this paper. Finally, we introduce the (1 + 1) EA and 
time- variable mutation rate scheme. 

2.2 A Theoretical DOP Model 

In this subsection, we define a theoretical model for dynamical optimization problems on binary search space. Briefly, 
our DOP model characterizes a common feature of most DOPs investigated by the evolutionary computation com- 
munity 33, 3.2 H5J I3HJ [55] , that is, the global optimum of a DOP is probabilistically changing over time. Like those 
DOPs intensively studied by the community, the DOP model allows uncertain events to occur at discrete time points 
only and accomplish without delay. In this way, the shortest duration of a stationary objective function can be guar- 
anteed, which greatly facilitates optimization algorithms, and, in the meantime, reflects reasonable simplifications 
that are widely employed when solving sophisticated DOPs in practice. Taking the dynamic vehicle routing problem 
as an example, it is difficult to always take any real-time factor into account when searching for the optimal routing. 
Instead, such factors are often temporarily collected, and contribute to the modification of objective function only 
at some discrete time points. 

Concretely, we define the DOP phase to be the time interval in which the objective function of a DOP remains 
stationary. For the sake of simplicity, we assume the duration of each DOP phase is 1 such that the time point index 
t (t E N, t = is the starting time point) can be utilized to distinguish one DOP phase from another. In the t th 
DOP phase, the change with respect to the objective function occurs and finishes at time point t. Within the DOP 
phase, the objective function, denoted by ft : {0, 1}™ — > R, remains stationary and only has a unique global optimum 
in the search space. The overall goal of solving the DOP is to maximize the objective function in the presence of 
movements of the global optimum. Besides, we do not consider any constraint-based stationary objective function 
in our investigations, and models involving such settings will be left as our future work. By summarizing the above 
descriptions, we present the following definition: 

Definition 2 (DOP Model). A DOP is a maximization problem whose stationary objective function may change at 
any time point ieM. At the t th DOP phase, the fitness (value of objective function) with respect to a given solution 
x G {0, 1}™ is given by ft(x), where ft : {0, 1}™ — > R is the stationary objective function at the t th DOP phase. 

As stated, a notable characteristic of most DOPs is that their global optima may change over time. In this paper, 
we characterize the movements of global optima as a kind of pure random walks in the binary solution space, which 
is a simple and natural way of modeling stochastic movements in theory: 



Definition 3 (Bitwise Shifting Global Optimum (BSGO)). The global optimum of a DOP is called a BSGO, if 
it is shifting following the rule Vf G N : x^ +1 = B n (xl), where is the global optimum at the t th DOP phase, 
B n : {0, 1}™ — > {0, 1}™ flips every bit of the input binary string with a probability of a £ (0, 1/2], and a is called the 
shifting rate. 

The DOPs whose unique global optima are BSGOs form the BDOP class: 

Definition 4 (BDOP Class). For a DOP following the model defined in Definition^ if it only has a unique global 
optimum and the optimum is a BSGO, then the DOP is a BDOP, and we also say that the DOP belongs to the 
BDOP class. 

In the rest of the paper, theoretical investigations will be carried out on the BDOP class, where the algorithms 
for solving such optimization problems will be introduced in the next subsection. 

2.3 Time-variable Mutation Rate Schemes and Evolutionary Algorithms 

In this paper, both individual-based and population-based EAs will be employed in our theoretical analysis, and 
the aim is to demonstrate the impact of time-variable mutation rate schemes on different EAs when solving DOPs. 
Concretely, the individual-based EA studied in this paper is called the (1 + 1) EA. At each generation, the EA 
maintains a unique parent individual, and the parent individual can only generate a unique offspring individual via 
mutation; The selection operator preserves the one with better fitness between the parent and offspring individuals 
(i.e., 1 parent + 1 offspring). A concrete description of the (1 + 1) EA studied in this paper is given below: 

Algorithm 1 ((1 + 1) EA). Choose the initial individual xf^ 1 randomly by the uniform distribution over the whole 
search space. Set the initial generation index t = 0. The t th generation of the EA consists of the following steps: 

( p) 

• Mutation: Each bit of the parent individual x t is flipped with the probability of P m (n, t) G [0, 1], where n 
is the problem size (i.e., length of the binary string). After that, an offspring individual xf^ is obtained. 

• Fitness Evaluation: Evaluate the fitness of xf and x[°^ based on the stationary fitness function ft at the 
t th generation (t th DOP phase). 

• Selection: If f t (xf^) > ft(xf^), then set xf\ = xf\' Otherwise, set xf\ — xf . 

If the given stopping criterion is met after the t th generation, then the EA stops; Otherwise, set t = t + 1 and a new 
generation begins. 

A significant difference between the above algorithm and the (1 + 1) EA for static problems [TH] is that, at every 
generation the former evaluates not only the fitness of the offspring but also that of the parent, while the latter 
only evaluates the fitness of the offspring. The reason of employing two fitness evaluations in one generation of our 
(1 + 1) EA is that the fitness of the parent individual may change in response to the change of objective function. 
Another difference between the above EA and the traditional (1 + 1) EA is that the former allows the mutation rate 
to vary over generation (i.e., the (1 + 1) EA adopts some time-variable mutation rate scheme), while the latter only 
adopts the fixed mutation rate P m = 1/n, where n is the problem size. To make our description formal, the concrete 
definition of time- variable mutation rate scheme for our (1 + 1) EA is given below: 

Definition 5 (Time-variable mutation rate scheme for (1 + 1) EA). The time-variable mutation rate scheme of the 
(1 + 1) EA is a mapping P m : NxN-> [0, 1]. Such a scheme sets the mutation rate at the t th generation be P m (n, t), 
where n is the problem size. 

In addition to studying the time-variable mutation scheme used in the above (1 + 1) EA, we also study the 
time- variable mutation schemes in the context of the following (1 + A) EA (A is polynomial in n): 

(p) 

Algorithm 2 ((1 + A) EA). Choose the initial individual x randomly by the uniform distribution over the whole 
search space. Set the initial generation index t = 0. The t th generation of the EA consists of the above steps: 

• Mutation: The parent individual xf^ generates A (A -< Poly(n)) offspring individuals x\ ,...,x\ in- 
dependently. When generating the \ th °ff s P r i n 9 individual x% (\ G {1, . . . , X}), each bit of the parent 
individual xf^ is flipped with the probability of P m (n, t, x) G [0, 1] . 



• Fitness Evaluation: Evaluate the fitness of X £ j X j ... j CLYbd X^ based on the stationary fitness function 
ft at the t th generation (t th DOP phase). 

• Selection: If max ^f t (x^ ),..., /t(x[ A ' ) )| > ft{x\ P ^), then set x[ P \ = argmax x£ { 1 x\ft{x^); Otherwise, 

£? jd f nr* ^ ' 'f ^ ^ 

iLt X t+1 — .L t 

If the given stopping criterion is met after the t th generation, then the EA stops; Otherwise, set t = t + 1 and a new 
generation begins. 

The (1 + A) EA is a population-based EA adopting the offspring-population strategy. To be specific, unlike those 
population-based EAs which maintain multiple parents and offsprings at each generation, the (1 + A) EA maintains 
a unique parent individual and a population of offspring individuals generated from the same parent. To guarantee 
that there is a unique parent at each generation, the selection operator imposes extremely high selection pressure, 
and preserves the one with the best fitness among the total 1 + A individuals (i.e., 1 parent + A offsprings). Also, 
the (1 + A) EA can be considered as a special case of the (A + A) EA, where the selection operator always copies the 
selected individual for A times to construct the population of the next generation. In this paper, it is worth noting 
that the (1 + A) EA introduced above allows different offsprings at the same generation to be generated via distinct 
mutation rates, which offers larger freedom for the adaptation of mutation rates than using the same mutation rate 
in generating all offspring individuals. Formally, the time-variable mutation rate scheme utilized by our (1 + A) EA 
is defined as follows: 

Definition 6 (Time-variable mutation rate scheme for (1 + A) EA). The time-variable mutation rate scheme of the 
(1 + A) EA is a mapping P m :NxNx {1, . . . , A} — > [0, 1]. Such a scheme sets the mutation rate for obtaining the 
X th offspring individual at the t th generation to be P m (n,t,x), where n is the problem size. 

Apparently, when A = 1, the (1 + A) EA is equivalent to the (1 + 1) EA, and Definitions [5] and [6] are equivalent 
to each other. 

In our theoretical investigations, we assume that there are always enough parallel computational resources available 
to support a polynomial number of simultaneous fitness evaluations, which is critical to make the (1 + A) EA valid 
for solving DOPs. Moreover, we also assume that the t th (t £ N) generation of every EA starts at the beginning of 
the t th DOP phase, and finish at the end of the t th DOP phase, which is important for theoretical analysis since it 
can avoid the degenerate cases where the objective function changes when an EA is carrying out fitness evaluations. 



2.4 Measure of Time Complexity 

So far we have introduced the problem and algorithm investigated in this paper. In this subsection, we present the 
measure of performances of EAs, which is indispensable to our theoretical studies. Traditionally, the performance of 
an EA on a static optimization problem can be measured by the first hitting time [24, 25, 26] EH [53]. This concept 
measures the number of generations needed by an EA to find the optimum of a static optimization problem, which 
can be generalized to facilitate theoretical analysis evolutionary dynamic optimization. For (1 + 1) and (1 + A) EAs 
on DOPs, we formally define the first hitting time as follows: 

Definition 7 (First hitting time). On a DOP {f t : t e N}, the first hitting time of a (1 + A) EA (X G N + is 
polynomial in n ), denoted by t, is defined as follows: 

r := min [t > 0; (x^ = x* t ) V (a^ = s t *) V • • • V (,^ A) = x*) } , (1) 

where x\ P ^ is the parent individual at the t th generation, and x^ , . . . , x[^ are the A offspring individuals generated 
by x[ P \ Setting A = 1 in the above definition and replacing the notation x^ with yield the first hitting time of 
the (1 + 1) EA. 

Based on the problem and algorithms introduced in this section, in the rest of the paper we study time- variable 
mutation rate schemes in terms of first hitting times of EAs. 



3 (1 + 1) EA with Time- Variable Mutation Schemes 



During the past decade, a number of studies have been dedicated to prove or validate that some specific time- variable 
mutation rate schemes are helpful to improve performances of EAs, although it is unclear whether this is generally 



true. In this section, we present several theoretical results concerning the performance of the (1 + 1) EA with different 
time- variable mutation rate schemes on BDOPs. In the first subsection, we offer a general result, demonstrating that 
the BDOPs with shifting rate a — w(logn/n 2 ) cannot be solved efficiently by the (1 + 1) EA with any time-variable 
mutation rate scheme satisfying Vt G N : P m (n, t) G [0, 1 — 1/ log n]. In the second subsection, we generalize the above 
result to a specific BDOP called the BitMatchinGd problem, and show that the (1 + 1) EA with any time-variable 
mutation rate scheme (i.e., Vt G N : P m (n,t) G [0, 1]) fails to optimize the BitMatchinGd problem efficiently when 
the shifting rate is a = w (log n/n 2 ). 



3.1 A General Result 

The BDOPs studied in this subsection are with shifting rate a = cj(logn/n 2 ), which implies that the shifting rate 
of the BDOPs satisfies that lim Tl ^ 00 (logn/n 2 )/cr = 0. The average movement of global opttimum at each DOP 
phase (which is equivalent to a generation of EA, as mentioned), measured by the Hamming distance, is larger than 
©(log n/n). This includes the case that at each DOP phase the global optimum changes by less than a bit on average. 
Intuitively, such a small movement speed of global optimum seems not to seriously affect the optimization process, 
and the (1 + 1) EA may probably cope with such situations by switching to appropriate mutation rates. In this 
subsection it is discovered by Theorem [T] that even such small movements will have significant influence on the first 
hitting time of the (I + f ) EA with different time- variable mutation schemes. The main result in this subsection is 
formally presented as follows: 

Theorem 1. Given any BDOP with shifting rate a = w(log n/n 2 ) and any time-variable mutation rate scheme 
{P m (n,t) G [0, f — 1/logn] : t G N}, the first hitting time of the (f + 1) EA is super-polynomial with an overwhelming 
probability. 

The above theorem holds when Vt G N : P m (n,t) G [0, 1 — 1/logn], which means that the largest mutation 
rate that the EA can switch to is 1 — 1/logn. Within the interval [0, 1 — 1/logn], the (1 + 1) EA can adjust its 
time- variable mutation rate freely following an oracle, i.e., at each generation the EA can even choose the mutation 
rate best suiting the current situation. The theorem can be proven given the following lemma first: 

Lemma 1. Given any BDOP with shifting rate a = w(log n/n) and any time-variable mutation rate scheme 
{P m (n,t) G [0, 1 — 1/logn] : t G N}, the first hitting time of the (1 + 1) EA is super-polynomial with an overwhelming 
probability. 

Proof Idea of Lemma\^ Generally speaking, when proving Lemma[TJ we should keep in mind that the (1 + 1) 
EA can adjust its time-variable mutation rate following an oracle. To be specific, we must carry out a best-case 
analysis so as to bound all potential behaviors of catching the global optimum of a BDOP. Given a solution x, define 
the number of matching bits of the solution (to the global optimum) to be the problem size n minus the Hamming 
distance between the solution and the current global optimum x\ (i.e., n — H(x, x* t )). A formal definition of Hamming 
distance H(-, •) mentioned above is given as below: 

Definition 8 (Hamming distance). The Hamming distance between two solutions x — (s\ , . . . , s n ) and y = (s^ , . . . , s' n ) 
(x,y G {0, l} n ) is given by H(x,y) := Ya=i \ Si ~ s 'i\- 

Let Xt and x[°^ be the parent and offspring individuals of t th generation (t G N) of the (1 + 1) EA, respectively. 
Let Xt be the one with higher fitness between x\ P ^ and xj ' (at the t th generation): 



Vt G N : x t := 



if fMP) > / t (4 0) ); 



i°\ if ft(xP) < ft(xP). 
N t :=n-H(xt,x* t ). (2) 



( x t 'i 1 * Ji 



Let Nf> :=n-H[x\ >,x* t ), N^> ;=n-H [x\ \x% , and 



It follows from the above definitions that 

max{iV t (P) ,7V t (0) } > N t > min {iV t (P \ N^} . (3) 

The definition of Nt (t G N) also implies the overall mapping that maps N t to N t +i, is determined by not only the 
EA, but also the optimum-shifting in BDOP. The reason is that, the (1 + 1) EA maps one solution to another only, 



while the optimum-shifting in BDOP can be considered as a mapping that describes the movements of the global 
optimum. The above two factors, together with N t , determine the Hamming distance between a solution and the 
current global optimum at the (t + l) th generation. 

To explain the proof idea, we define a number axis (real number line) with respect to the number of matching 
bits (to the current global optimum), ranging from to n. As illustrated in Fig. [JJ we further define several intervals 
on the axis: 

Definition 9. The First Forbidden Interval and First LongJump Interval are defined as follows: 

1. First Forbidden Interval: The First Forbidden Interval is the interval Fx := [n — n/log 3 n, n], where n is 
the problem size. 

2. First LongJump Interval: The First LongJump Interval is the interval Li := [0, n/log n]. 
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Figure 1: Interval Decomposition for Lemma [TJ 

In addition, we let Oi := (n/log 2 n, n — n/log 3 n) such that Li U Oi U Fi = [0,n]. The above decomposition 
is illustrated in Fig. [1] The purpose of defining the three intervals is to quantitatively characterize three general 
states of the (1 + 1) EA respectively. Intuitively speaking, for the First Forbidden Interval, N t £ Fi demonstrates the 
situation that the better one between the parent and offspring at the t th generation is very close to the moving global 
optimum of a BDOP. Similarly, belonging to the First LongJump interval (Nt 6 Li) indicates the situation that 
the solution found by the EA is extremely far away from the global optimum, but it can reach the First Forbidden 
Interval Fi by an extremely long jump resulted from a very large mutation rate (i.e., 1 — 1/ logn). Finally, belonging 
to the interval Oi represents the situation that the EA is still far from finding the optimum, no matter what concrete 
mutation rate the EA adopts. 

In our best-case analysis for proving Lemma [TJ as long as the EA has found solutions belonging to the intervals 
Fi and Li, we optimistically consider that the EA has reached the global optimum. By noting an important fact in 
Definition [JJ that the EA starts in ©i (i.e., the number of matching bits of the initial solution belongs to ©i) with 
an overwhelming probability, to prove Lemma [TJ we only need to prove that the transition from ©i to Fi and Li is 
very unlikely to happen. 

Formally, we need to prove the following propositions when proving Lemma [TJ 



Proposition Al.l: F(N Q G Fi ULj) -< 1/ Super Poly (n). 
Proposition A1.2: V( £ N+ : P(7V t (P) 
Proposition A1.3: Vi G N+ : P(7V t (P) 



G Fi U Li | N t 

AO) 



. ©i) -< 1/ Super Poly(n). 
GF x UL X | N t -i G Oj) -< 1/ Super Poly(n) 



For the detailed proof following the above sketch, interested readers can refer to the appendix of the paper. □ 
From LemmaHJ when proving TheoremUJwe only need to cope with smaller shifting rates satisfying the conditions 
a = 0(logn/n) and a = o;(logn/n 2 ). Given the condition a = 0(logn/n), we let a < S log n/n in the proof of Lemma 
[JJ where 5 is an arbitrary positive constant. Having identified the above conditions, we define 7 = 7(71, a) as follows: 



7 = 7(71, a) := min 
Further, let G = G(n, a) be defined by 



loe 



-,cr • 



logn 



G = G(n, a) := 7 4 / 7 log 



(4) 



The purpose of introducing the notations 7 and G is to further define subintervals for the interval [0, n] with respect 
to the number of matching bits of a solution to the current optimum, in addition to Fx, Lx and ©i. Concretely, we 
consider the following new intervals: 



Definition 10. Some intervals are introduced below: 

1. Second Forbidden Interval: The Second Forbidden Interval is the interval F 2 := [n — G,n], where n is the 
problem size. 

2. Primary Adjacent Intermediate Interval: The Primary Adjacent Intermediate Interval is the interval 
Ai := [n-2G,n-G). 

3. Secondary Adjacent Intermediate Interval: The Secondary Adjacent Intermediate Interval is the interval 
A 2 := [n-3G,n-2G). 

4. Second LongJump Interval: The Second Long Jump Interval is the interval L 2 := [0,4G], where n is the 
problem size. 

5. Primary Remote Intermediate Interval: The Primary Remote Intermediate Interval is the intervalMi := 
(4G,5G]. 

6. Secondary Remote Intermediate Interval: The Secondary Remote Intermediate Interval is the interval 
M 2 :=(5G,6G}. 
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Figure 2: Interval Decomposition for Theorem Q] 

The above intervals, along with Fi and Li, are illustrated in Fig. [21 Generally speaking, the above interval 
decomposition inherits and generalizes the intuitive idea utilized in the proof of Lemma [T] Similar to Fi and Li in 
the proof of Lemma Q] the Second Forbidden Interval F2 is used to characterize the state that a solution found by 
the EA is very close to the current global optimum; The Second LongJump Interval L 2 is used to characterize the 
state that a solution found by the EA is extremely far away from the current global optimum, which is very likely 
to further reach the optimum by employing an extremely large mutation rate (e.g., 1 — 1/logn). However, due to 
different values of shifting rates (w(logn/n) holds in Lemma[T]while w(logn/n 2 ) and 0(\ogn/n) hold in Theorem!]}, 
the concrete sizes of F 2 and L 2 arc different from those of Fi and Li respectively. In addition to the configurations of 
"forbidden" and "long-jump" intervals, here we employ some extra intervals which serve as intermediate intervals for 
reaching the "forbidden" and "long-jump" intervals, F 2 and L 2 . Concretely, at the very beginning of optimization, 
the solution found by the EA belongs to the interval ©1 defined in the last subsection. To find a solution in F 2 by 
employing a small mutation rate, the EA must find some solution in the Secondary Adjacent Intermediate Interval 
A 2 first. Afterwards, the EA needs to travel through the Primary Adjacent Intermediate Interval Ai, i.e., try to find 
solutions that exceeds Ai and finally reach F 2 . On the other hand, to find a solution in F 2 by employing an extremely 
large mutation rate, the EA has to find some solution in the "long-jump" interval L 2 first. Nevertheless, to reach 
L 2 , the EA has to find some solution in the Secondary Remote Intermediate Interval B 2 first, and afterwards travel 
through the Primary Remote Intermediate Interval Bi so as to reach L 2 . In our best-case analysis, once the EA has 
found solutions belonging to the intervals F 2 and L 2 , we optimistically consider that the EA has reached the global 
optimum. 

So far we have briefly introduced the interval decomposition utilized in the proof of Theorem[TJ Next, we provide 
the detailed sketch for proving Theorem [TJ 

Proposition EQJl: F(N G A 2 U Ai U F 2 U B 2 U Bj U L 2 ) -< 1 /Super Poly (n). 

Proposition BQ32: Vi that satisfies ieN + and P m (n,t) < 7 1 / 14 logn/n, 



|JV t -JV t _i| <<5 7 1/7 logn I P m (n,t) < 



7V 14 log n S\ogn 

,cr < 

n n 



y 1 - 



1 



Super Poly (n) 



holds. 



Proposition B[T}3: 



(E(T]3a) To reach F2 within a polynomial number of generations, with an overwhelming probability the 
EA must reach A2 or L2 first. 

(E[T]3b) To reach L2 within a polynomial number of generations, with an overwhelming probability the 
EA must reach B2 first. 

Proposition BQ}4: 

1 

Super Poly (n) 



1 

Super Poly (n) 

Proposition B[TJ5: To travel through Ai, the EA will spend a super-polynomial number of generations with 
an overwhelming probability. 

One may note that Propositions E[TJl, and E[Tj5 are in response to our discussions in the previous paragraph. In 
addition, Propositions EQ32 and EQ]4 are both formal propositions concerning the gain of the better solution (between 
the parent and offspring at the t th generation) found by the EA, compared with that of the previous generation. 
However, the two propositions tackle different conditions concerning the shifting rate of BDOP (i.e., a), mutation 
rate (P m (n,t)) and so on, which are crucial for the formal proof of Propositions E(TJ3 and E[T]5. A detailed proof of 
Theorem Q] is in the appendix of the paper. 

Theorem Q] presents a general result showing that any BDOP with shifting rate w(log n/n 2 ) is essentially hard 
to the (1 + 1) EA with any time- variable mutation rate scheme {P m (n,t) : t G N} satisfying Vi £ N : P m (n,t) G 
[0, 1 — 1/logn]. These essentially hard BDOPs, which characterize the movements of global optimum as random 
walks in the binary solution space, demonstrate the potential weakness of time-variable mutation rate scheme on 
DOPs with moving optimum. However, it is worth noting that the results obtained in this section focus on those 
time-variable mutation rate schemes that satisfy Vi G N : P m (n,t) G [0, 1 — 1/logn], i.e., the largest value that the 
mutation rate can take is 1 — 1/logn. It is unclear whether an even larger mutation rate that exceeds 1 — 1/logn 
would help, though knowing when to apply such an extreme mutation rate relies on some ideal oracle (i.e., we know 
the consequence of such decisions in advance) . To further validate the potential failure of all time- variable mutation 
rate schemes on some specific BDOP, we will extend Theorem Q] for the BDOP class to a concrete example, named 
the BitMatchinGd problem. 

3.2 A More Precise Result on BitMatching/) 

In this subsection, we draw our attentions to a specific example of BDOPs class called BitMatchinGd. Concretely, 
the BitMatchinGd problem is a BDOP using the following stationary function at the t th DOP phase: 

BitMatchinGd (x, t) := n - H(x, x* t ), (5) 

where x* t is the global optimum of BitMatching^ at the t th DOP phase. 

To have a more comprehensive understanding of the performances of time-variable mutation rate schemes on 
BitMatchinGd, here we consider the time complexity of finding approximate solutions with some certain quality, 
instead of considering only the time complexity of finding the moving global optimum. Here, the following specific 
characteristic for approximate solutions of DOPs, named the best-case DOP approximation ratio, is taken into 
account: 

Definition 11 (Best-case DOP approximation ratio). Suppose we have a maximization DOP I and an optimization 
algorithm A. Let ft{x1) be the optimal value of the objective function of the DOP I at the t th generation (t G N), Xt 
be the best solution found by the algorithm A at the t th generation, ft{xt) be the value of the objective function with 
respect to the solution Xt at the t th generation. For the algorithm A, the ratio sup t ~ <t y'j^jj * s called the best-case 
DOP approximation ratio at the t th generation. 



(BQ]4a) Vt that satisfies t G N+ and N t -! G A 2 U Ai U F 2 , 

ffNt-Nt-i < 7 1/7 I AT t _i G A 2 U Ai UF 2 ,cr < y 1 - 

holds. 

(EQ]4b) Vi that satisfies t G N+ and N t -i G B 2 U Bi U L 2 , 

FUSTt-i-Nt <7 1/r I jV t _i G B 2 UBi UL 2 ,cr < >- 1- 

holds. 



Given the above definition, The definition of first hitting time in Eq. Q]can be generalized to the so-called (1 — e)- 
first hitting time concerning the time for finding the solutions that reach a certain best-case DOP approximation 
ratio, say, 1 — e: 

Definition 12 ((1 - e)-first hitting time for (1 + 1) EA). On a DOP {f t : t = 1, 2 . . . }, the (1 - e)-first hitting time 
of an EA, denoted by T e , is defined as follows: 

T t := min \t > 0; (x\ P) G J t ) V (^ 0) G Jj) } , (6) 

where J t := |x t : > 1 - e j and e G [0, 1). 

The BitMatchinGd problem is a special case of the BDOP class. By generalizing the proof ideas mentioned in 
the last subsection, we are able to obtain a number of new theoretical results. The following lemma can be derived 
from Lemma [1] 

Lemma 2. Given the BitMatchinGd problem with shifting rate a = uj{\ogn/n) and any time-variable mutation 
rate scheme {P m {n,t) G [0, 1] : t G N} of the (1 + 1) EA, the (1 - I / log 3 n) -first hitting time of the (1 + 1) EA is 
super-polynomial with an overwhelming probability. 

Based on Lemma [2] and the proof idea of Theorem [TJ we can obtain the following theorem: 

Theorem 2. Given the BitMatching^) problem with shifting rate a = w(logn/n 2 ), and any time-variable mutation 
rate scheme {P m {n,t) G [0, 1] : t G N} of the (1 + 1) EA, the (1 - 7 4/7 log n/n) -first hitting time of (1 + 1) EA is 
super-polynomial with an overwhelming probability, where 7 is defined by 

( n 71 1 

7 = 7(n,cr) :=mmi ,a ■ \. (7) 

L log n log n J 

The proofs of Lemma [2] and Theorem [2] are similar to those of Lemma [T] and Theorem Q] respectively. Interested 
readers can refer to the appendix for details. Lemma [2] and Theorem [2] will lead to the following corollary about the 
most commonly used fixed mutation rate 1 jn directly, which was proven by Dorste j!7j : 

Corollary 1. The first hitting time of the (1 + 1) EA with the fixed mutation rate 1/n on BitMatchinGd problem 
with a = w(log n/n 2 ) is super-polynomial with an overwhelming probability. 

Combining the above corollary with Theorem [2] we obtain an interesting result: 

Corollary 2. Given the BitMatching^ problem with a = uj(\ogn/n 2 ), both the (1 + 1) EA with any time-variable 
mutation scheme and the (1 + 1) EA with the most commonly used fixed mutation rate 1/n performs inefficiently. 

Clearly, the corollary indicates that no time-variable mutation rate schemes significantly outperforms the most 
commonly used fixed mutation rate 1/n when the (1 + 1) EA is employed as the optimizer of the BitMatchinGd 
problem with shifting rate a = w(log n/n 2 ). Moreover, Droste [17] proved that the (1 + 1) EA with the fixed mutation 
rate 1/n can reach the global optimum of BitMatchinGd with shifting rate a — O (log n/n 2 ) with a polynomial 
average first hitting time: 

Theorem 3 (Droste f_?7f ). The mean first hitting time of the (1 + 1) EA with the fixed mutation rate P m = 1/n on 
BitMatchinGd with a — 0(log n/n 2 ) is polynomial in the problem size n. 

Given that fixed mutation rates are only special cases of time-variable mutation schemes, the above result can 
also be interpreted as that there exists some time- variable mutation scheme with which the (1 + 1) EA can solve 
BitMatching^) with a = (9(logn/n 2 ) with a polynomial mean first hitting time. It follows from Theorem [2] that 
the BitMatchinGd problem with a = 6(logn/n 2 ) is the hardest BitMatchinGd on which a (1 + 1) EA can 
guarantee efficient performance. In the next section, we show that by adopting a population, a (1 + A) EA with some 
time- variable mutation schemes can break the above limitation. However, when the shifting rate a = fi(log n/n 2 ), a 
(1 + A) EA with different time-variable mutation schemes will still encounter bottleneck when optimizing BDOPs. 

4 (1 + A) EA with Time- Variable Mutation Schemes 

So far we have analyzed the effectiveness of time- variable mutation schemes in the context of the (1 + 1) EA. In this 
section, our analysis will be carried out in the context of a population-based EA called (1 + A) EA. A case study 
on the BitMatchinGd problem will be given to show the overall impact of population and time-variable mutation 
schemes. 



4.1 A General Result 



The (1 + A) EA studied in this paper follows the framework presented in Algorithm^ The time variable mutation 
schemes for the (1 + A) EA, defined in Definition [51 allows the EA to utilize distinct mutation rates in generating 
different offsprings in the same generation. However, when solving BDOPs, such an EA may still be inefficient when 
the shifting rate of a BDOP exceeds O (log n/n): 

Theorem 4. Given any BDOP with shifting rate a = ui (log n/n) and any time-variable mutation rate scheme 
{P m (n, t,x) € [0, 1 — 1/ logn] : t £ N, \ G {1; • • • > A}}, the first hitting time of the (f + A) EA is super-polynomial 
with an overwhelming probability, where the offspring size X is a polynomial function of n. 

The proof of Theorem 3] is a direct generalization of the proof idea of Lemma [TJ Interested readers can refer to 
the appendix for details. 

Apparently, over the BDOP class, Theorem 2] shows a theoretical limitation for time- variable mutation schemes 
associated with the (1 + A) EA. Nevertheless, the theorem sheds some light on potential efficient performances of 
the (1 + A) EA with time- variable mutation schemes on those BDOPs whose shifting rate is between O (log n/n) 
and ©(logn/n 2 ) (note that Theorem Q] tells us that the (1 + 1) EA performs inefficiently on these BDOPs). In fact, 
compared with the (I + I) EA, the (I + A) EA has indeed been strengthened by two factors. First, the offspring- 
population strategy, which is a specific way of utilizing population, offers larger selection pressure for the (I + A) EA. 
When optimizing DOPs, this feature is very helpful for tracking the movement of global optimum. Second, in one 
generation the (I + A) EA is capable of exploring different subsets of the search space via distinct step sizes. Owing 
to these two factors, it can be expected that the (I + A) EA can significantly outperform the (I + I) EA on some 
BDOPs. In the next subsection, we present such a theoretical example. 



4.2 Case Studies on BitMatching^ 

As in Section I3T21 we still employ the BitMatchinGd problem as an example of the BDOP class. We show that the 
(I + A) EA with time-variable mutation schemes can improve the performance of the (1 + 1) EA. Meanwhile, by the 
same theoretical result we are able to demonstrate that the general limitation of (1 + A) EA over the BDOP class, 
predicted by Theorem [JJ is almost tight. The main result of this subsection is as follows: 

Theorem 5. Given the BitMatchinGd problem with shifting rate a < l/(5n), if 

Cond. 1 the time-variable mutation rate scheme {P m (n,t,x) G [0, 1] :i£N,x£ {I, • • • , A}} satisfies 

V<eN: sup P m (n,t, X ) = f-^ ) , inf P m (n,t,x) >- ' 



teN, x e{i,...,A} V n J teN, x e{l,...,A} Polyin) 

Cond. 2 the polynomial offspring size A of the EA satisfies 



A = w|(l- sup P m (n,t,x)\ ( mf P m (n,t, X )) ]■ (8) 
teN, x e{i,...,A} J \te®,xe{l, ...,a} 

then the mean first hitting time of the (1 + A) EA with the above time-variable mutation rate scheme is bounded from 
above by 8n/5. 

Theorem [5] demonstrates that, when a time- variable mutation rate scheme satisfies certain conditions, then the 
(I + A) EA adopts such a mutation scheme will perform efficiently on the BitMatchinGd problem whose shifting 
rate is no larger than 1 / (5n) . The proof of the theorem utilizes the drift analysis technique proposed by He and Yao 

Lemma 3 (Drift Analysis [23]). Let £ t (t £ N) be the population at the t th generation of the EA, D{^ t ,t) be the 
distance metric measuring the distance between the population £t and the global optimum at the t th generation, and 
{D(£t, t) : t = 0, 1, . . . } be a super-martingale describes an EA, if for any time t — 1,2, ... , if Z?(£t, t) > and 

E [D (&, t)-D t + I) | &] > a > 0, 

then the mean first hitting time satisfies 

, ^ 1 ^ fl(g ,0) ^ su PxeVtteN D(X,t) 

E T Co < < , 

c; ci 

where £o is the initial population of the EA, V is the set of all populations. 



Before providing the formal proof for Theorem[5l we introduce some notations. Let x\ be the parent individual 
of t th generation (f G N) of the (1 + A) EA, and x\ \ . . . ,x[ ' be the A offspring individuals generated by x[ \ Let 



,(°) 



be the one with highest fitness among x 



(i) 



t > - ■ • > *t 



Vi G N : x 



(O) 



arg max f t (x; ) . 
xe{i,...,A} 



( p) ( 

Let xt be the one with higher fitness between x t and x\ 



(O). 



Vt G N : x t 



4 P \ if / t (^ p) ) > M4 0) ); 

x?\ if / f (4 P) )</ f (4 0) ). 



Let N 



(P) 



H{x { t p) ,x: 



N 



(O) 



^a;[°\xj^, and N t := n — H(x t ,x^). Specifically, for the Bit- 

MatchinGd problem, it is clear that N t is the larger one between N^ and nI°\ i.e., N t — max (n^ P \ N^\. 
Furthermore, define nij G {0, f, . . . , n}) and the corresponding generalized notation as follows: 



7r i , ej -:=p(iV t (P) ei|iV t _ 1 =i), 



where the "©" at both sides can be replaced simultaneously by ">" j "<" j ">" r"<". mj is independent of the 
generation index t since at each DOP phase the global optimum moves with the same shifting rate, as defined in 
Definition [3] Define pij(t) G {0, f, . . . , n}) and the corresponding generalized notation as follows: 



p itj (t) := P (jVt 0) = 3 I N t F) = i) 
& i<M (t) :=p(iV t (0) ffii|JV ( 



(P) 



where the "©" at both sides can be replaced simultaneously by ">" j "<" j ">" or"<". Based upon the above lemma 
and notations, we provide the proof of Theorem [SJ 

Proof of Theorem To apply Lemma [3J we need to define a suitable distance function, and estimate the 
corresponding one step mean drift at the t th (t G N + ) generation of the (1 + A) EA. Given the union of the parent 
and its A offsprings as a whole population X, the distance function D(X,t) (t G N), which measures the distance 
between X and the moving global optimum at the t th generation, is formally defined by 

D(X, t) := min {H{x, x* t ); x G X} , 

where xl is the global optimum at the t th generation (t G N + ). Then denote by ADi(t) the one step mean drift 
at the t th generation, conditional on that the largest number of matching bits found at the (t — l) fh generation is 



N 



t-i 



AA(t) :=M[D(X t - 1 ,t-l)-D(X f ,t)\Nt- 1 =%\=2^j 

3=1 

which is the sum of the following four components: 
A 



<(D(X t _ 1 ,t-l)-D(X t ,t)=j\N t . 1 



A, 



A, 



NP>i,NP<NP 
:= E 

N t {P) >i,NP>NP 



D{X t - U t- 1) - D{X u t)-N ( t P) > z, A^ (0) < N^lNt-x = i 



:= E 



DiXt^t - I) - D{X u t)-N { t P) >i,N^> >N^ ) \N t -x=i 



AO) 



t(p) 



D(X t -!,t - 1) - D(X t ,t);N t iP) <i,N^ J <i\N t -i=i 

iv f (p) < *,^ (0) > 

:= E \D(X t -i,t- 1) - D(X t ,t);N$ P) < i,N$ 0) > i\N t -i = i 



AO) 



Formally, based the above notations the one step mean drift can be rewritten as 



ADi(t) = Aj \nP > i,Np ] < NP 



A,; 



NP>t,Nr>>N t 



AO) 



(P) 



+Ai 



nP< 1 ,nP< 1 



JVf><i,JVf>>i 



Next we estimate the four components one after another. By dividing the event "7V t (P) > i,N^ 0) < 7V t (P) " into a 

number of sub-events 'W t (P) = i, N^ 0) < 7V t (P) " , 'W t (P) =i + l, nP < 7V t (P) " , . . . , 'W t (P) = n, nP < 7V t (P) " whose 
probabilities are all positive, we know the following fact about the first component of the one step mean drift: 

A, [NP > i, JV<°> < N p] =±(j- i)^ 3 ■ P (MP > NP I NP = j) > 0, 

j=i 

Similarly, we can divide the event nP^ > i, nP > nP into several sub-events, and then estimate the following 
component of the one step mean drift: 

n n—j 

A« [NP > i, NP > NP] = £ X)Cj - i + k)*ij ■ P (NP + k = NP I JV<"> = j) 

j—i k—1 

n 

> J2(j i + ik, • p ( N p > N p i N p = j) 



> inf f( NP > AT, 

0<k<n,t€V V 



(P) I N (P) 



0<fc<n,t'eN I \ z — ' / \ z — ' 



3=j 
n — i 



n — i 



n—j+i 



> inf pk,>k(t') ■ < TTi,>i + (1 - a) 1 i 

0<k<n,t'eN ^ y ^—^ 

> inf Pk,>k(t') ■ {-Ki, >j + (1 - a)'(n - i)a\ 

> inf Pk,k+i(t') ■ Wi,>i + (1 - c) n o-}, 



where we utilize the fact that irij > (™_*)tT J ' '(1 — 0")™ ^ ') holds for all j > i. By similar calculations, we further 
obtain the following component of the one step mean drift: 



i— 1 i—j 



NP + k = N, 



(O) | n (p) 



a, [iv< p) < i, nP <i]=J2T,v- i+ k ^ 

3=0 k=l 
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3=0 

> E *u ■ E p {P p) + k = np i N p =j)-Y(i- j)** ■ E p (^ (P) + k = ^ (0) i ^ (P) = j) 

j=0 fc=l 3=0 k=l 

-Y(i-j)^3-v(NP<NP\NP=j) 

3=0 

> E ■ E p K p) + k - ^ to) i ^ (p) = j) - & - j>« 



3=0 



fc=l 



3=0 



> mf Pfc,fc+l(t ) • V'TriJ - - j) 

0<k<n,t'eN ^ ^ 

3=0 3=0 



> inf pk.k+i(t') ■ ^P - ^ k(ia) k 



3=0 



i(iay +1 - (i + l){iay + 1 
(1 -iof- 



= inf p k ,k+i{t') -^pTYi.j ~ (ia) 

3=0 

> inf Pk,k+i(t ) • 7r i ,< l - (na) ■ — 



Finally, the fourth component of the one step mean drift AZ^(t), as the first component, is positive: 

n 

A, [NP < i, NP > i] > E ■ P (^ (D) > * I N ^ = j) > 0. 

j=i 

The lower bounds of the four components yield the lower bound of ADi(t): 

AA(«) = A, [nP > i, nP < JVf >] + A, [iV t (P) > i, NP > NP} + A, [nP < i, iV« 0) < i] + A, [jvf > < i, JV t (0) > i] 

i 

> inf p"fc,k+l(0 • {7Ti,>i + (1 - cr)"cr} + inf Pfc,fe+i(t') ■ 7Ti,<i - V] fc(i<7) fc 

_ - k = l 

i 

> inf (t')-iTi,>i + inf Pk,fc+i(*') • 7T»,<i - k(ia) k 

0<t<n,t'6N 0<fc<n,t'eN ' 

_ _ fc=l 

, i(ia) <+2 -(z + l)(ia) <+1 + (»a) 

- inr Pfc,fc+i(t j — ^ , 

o<fc<n,t'eN (1 — ia) A 

where 53L=i k(ia) k is monotonically increasing with positive i and <r. Given the fact i < n and the condition 
c < 1/(5"), we further have: 

n n + 1 r 1 p. 

Aft(t) > inf ~ — if^ 1 > inf P*,*+i(*') - T7J 

0<fc<n,i'SN i| 0<fc<n,t'£N 16 

> 1 (1 (l sup P m (n,t,x)) inf Pm(M,x)] - ^ 

y y tGN, x e{i,...,A} y t6M, x e{i,...,A} y 16 

> 77 - ( 1 - ( 1 - sup P m (n, t, x) ) inf P m (n, t, x) 
16 \ V teN,x€{l.-.A} / t6N, x e{i,...,A} 

Under the conditions sup teNxe{1 A} P m (n,t,x) = O (log n/n) and mf te N,xe{i,...,A} Pm(«,*,x) >- l/Poly(n), 
[1- sup P m (n,t,x)] inf P m (n,i,x) >- 



teN, x e{i,...,A} / teN, x e{i,...,A} Poly(n) 

and the above item is strictly less than 1. Hence, there exists a polynomial offspring size 



X = uj \ 1 - sup P m (n,t,x) inf P m (n,t,x) 

teN,x€{i,...,A} / V* eN .xe{i^--,A} 

such that the one step mean drift at the t th generation (Vt < r, t £ N + ) has a common lower bound 

Vt < r,t S N+ : Aj(t) > -. 

8 

According to Lemma |31 given the offspring size A specified by Eq. [5J the mean first hitting time of the EA under two 
conditions of Theorem [5] satisfies 

EMCond. l.Cond. 2] < — . 

5 

Under this circumstance, the number of function evaluations of finding the global optimum for the first time, denoted 
by T, satisfies 

E[T|Cond. l,Cond. 2] < 



□ 

The above theorem offers a sufficient condition for the (1 + A) EA to achieve efficient performance on the Bit- 
MatchinGd problem with shifting rate a < l/(5n). As a direct corollary of Theorem [SJ we present the following 
time complexity result of the (1 + n 2 log n) EA whose time- variable mutation scheme {P m (n, t, x) € [0, 1] : t S N, % € 
{1,...,A}} satisfying sup teNxe{1 ^ x} P m (n,t,x) = logn/n, inf teNjXe { lj ... iA }P m (n,t,x) = V n (note that the fixed 
mutation rate 1/n is a special case of such time- variable schemes) : 



Corollary 3. Given the BitMatchinGd problem with shifting rate a < l/(5n), if the time-variable mutation rate 
scheme {P m (n, t, x) G [0, 1] : t G N, x G {1, . . . , A}} o/ i/ie (1 + n 2 logn) EA satisfies 



Vt e N : 



sup 

teN,xe{i,—,A} 



P m (n,t,x) = log n/n, 



inf 

teN,xe{i,.-.,A} 



Pm(n,t,x) = ~, 



then the mean number of function evaluations of finding the global optimum for the first time is bounded from above 
by 8n 3 logn/5. 

Theorem [5] tells us, when the shifting rate of the global optimum is smaller than l/(5n) (each movement only 
changes no more than 1/5 bit of the global optimum on average), the (1 + A) EA is able to compensate the negative 
influence brought by the movement of the global optimum via generating a number of offspring individuals with 
different mutation rates (in each generation), and the EA can achieve efficient performance on the BitMatchinGd 
problem. However, by the following theorem we will know that, when the shifting rate a further grows to w(logn/n) 
(each movement changes at least log n bits of the global optimum on average), even the multiple-offspring strategy and 
time- variable mutation rate schemes cannot help to achieve efficient performance on the BitMatchinGd problem. 

Theorem 6. Given the BitMatchinGd problem with shifting rate a = u>(logn/n), and any time-variable mutation 
rate scheme {P m (n, t, x) G [0, 1] : t G N, X G {lj • • • j ^}}; the fi rs ^ hitting time of the (1 + A) EA is super-polynomial 
with an overwhelming probability. 

The proof of Theorem [H] follows the proof idea of Lemma Q] and Theorem 2J Interested readers can refer to the 
appendix for details. 

5 Discussions 

In this section, we discuss some issues related to the theoretical results presented in previous sections. 
5.1 Generalizations of Theoretical Results 

Here we discuss potential ways of generalizing our theoretical results from different perspectives. 
5.1.1 What if the shifting rate of a DOP is also time-variable? 

Technically, the theoretical results presented so far can further be generalized to a broader class of DOPs. In 
particular, we can modify the definition of Bitwise Shifting Global Optimum (BSGO) (Definition [3]) by allowing the 
global optimum to move with different shifting rates at different DOP phases: 

Definition 13 (Bitwise Shifting Global Optimum with Time-variable Shifting Rate (BSGO-TSR)). The global 
optimum of a DOP is called a BSGO-TSR, if it is shifting following the rule Vt 6 N : ^t+i = Bn,t(%t)> w here 
Bn.t '■ {0, 1}™ — > {0, 1}" flips every bit of the input binary string with a probability of a(t) G (0, 1/2], and a(t) is called 
the time-variable shifting rate. 

The theoretical results obtained in this paper can be generalized by replacing the time-invariable shifting rate of 
the BDOP with the above time- variable shifting rate, and the corresponding proofs will not be significantly different 
from existing ones. As an example, we present a generalized version of Theorem [2] as an example: 

Proposition 1. Given the BitMatchinGd problem with shifting rate {cr(t) — uj(\ogn/n 2 ) : t G N + }, and any time- 
variable mutation rate scheme {P m (n,t) £ [0, 1] : t G N} of EA, the (1 — j 4 / 7 log n/n) -first hitting time of (1 + 1) EA 
is super-polynomial with an overwhelming probability, where 7 is defined by 



The above proposition replaces the fixed shifting rate a = w(log n/n 2 ) in Theorem[2]with the time- variable shifting 
rate {o~(t) — ui(\ogn/n 2 ) : t G N + }. Similarly, we can generalize other theorems in the paper. Such generalizations 
are correct since our theoretical analysis does not utilize concrete values of the shifting rate a, but only relies on the 
upper bound, lower bound or asymptotic order of a. Hence, the original proofs of our theorems can easily be relaxed 
and utilized as proofs of such generalizations. For the sake of brevity, we do not provide the detailed analysis in this 
paper. 




(9) 



5.1.2 Characterizing all forms of adaptations by condition- variable mutation rate schemes 

The time- variable mutation rate schemes studied in this paper are defined in Definitions [S] and |H1 In our theoretical 
analysis showing theoretical limitations of time- variable mutation rate schemes, we avoid to utilize the concrete values 
of mutation rates. Instead, we optimistically considered that, by the help of an oracle, an EA can always choose the 
most promising mutation rates in every generation. Such a notion can be alternatively characterized by explicitly 
involving sophisticated information (e.g, fitness of current individuals) as conditions for specifying mutation rates, 
which yields the definition of condition- variable mutation rate schemes for (1 + A) EAS 

Definition 14 (Condition- variable mutation rate scheme for (1 + A) EA). The condition-variable mutation rate 
scheme of the (1 + A) EA is a mapping P rn : N x N x {1, . . . , A} x CS — > [0, 1], where CS is the condition space 
consisting of all potential conditions that may contribute to the decision of mutation rates. Such a scheme sets the 
mutation rate for obtaining the x th offspring individual at the t th generation be P m (n, t, x, Ct) * n the presence of Ct- 

The above definition is general enough to characterize all forms of adaptations. Owing to the oracle notion 
utilized in our paper, all theoretical results proven for time-variable mutation schemes also hold accordingly when 
condition- variable schemes replace time- variable schemes. For example, the generalized versions of Theorems [1] and 
@]with respect to condition- variable schemes are 

Proposition 2. Given any BDOP with shifting rate a — Lu(logn/n 2 ) and any condition-variable mutation rate 
scheme {P m (n,t, Ct) £ [0, 1 — 1/logn] : t 6 N, Ct 6 C§} ; the first hitting time of the (1 + 1) EA is super-polynomial 
with an overwhelming probability. 

Proposition 3. Given any BDOP with shifting rate a = cj(logn/n) and any condition-variable mutation rate 
scheme {P m (n,t,x, C t ) <E [0, 1 - 1/logn] :(eMj£ {1, . . . , A}, C t e CS}, the first hitting time of the (1 + A) EA is 
super-polynomial with an overwhelming probability, where the offspring size X is a polynomial function of n. 

For the correctness of the propositions, a straightforward and intuitive explanation is that no matter which 
condition- variable mutation rate scheme an EA adopts, in the optimization process it has to follow a time-dependent 
configuration of concrete mutation rates that can be viewed as a time-variable mutation rate scheme specified by 
an oracle. Since we have proven the high failure probability of each time- variable mutation rate scheme, we cannot 
expect that such a concrete setting can be effective if it is online-specified by some condition-variable mutation 
scheme. From a technical perspective, by looking at the details of the proofs of the theorems, it is easy to find 
that they have considered in detail all possible random transitions between different subsets decomposed from the 
solution space, and the decisions of any condition- variable mutation rate scheme under different conditions in every 
generation have been included in the analysis. To sum up, the theoretical results obtained in this paper are general 
enough to show theoretical limitations of adaptations of mutation rates in evolutionary algorithms. 

5.2 Conjectures about (fi + A) EA 

In the evolutionary computation community, the (/i + A) EAs, which maintain /i parents and generate A offsprings 
in each generation, have received extensive investigations over the past decades. Apparently, the (1 + 1) and (1 + A) 
EAs studies in this paper are special cases of (/x + A) EAs. After showing the theoretical limitations of time-variable 
mutation rate schemes for both EAs, a natural question is, whether such theoretical results can be generalized to 
other (u + A) EAs' cases? 

Assume that each time-variable mutation rate scheme of a (u + A) EA allows the algorithm to adopt A (not 
necessarily different) mutation rates when generating A offsprings at each generation. For any of such EAs, we 
conjecture that the time-variable mutation rate schemes fail to help them to perform efficiently when the shifting 
rate of a BDOP exceeds some threshold. However, for different settings of fj, and A, the concrete thresholds might be 
different (as shown by our results). Intuitively, we conjecture that when the ratio A/u becomes larger, the threshold 
of shifting rate will become higher. Nevertheless, this does not mean that we can excessively enhance the threshold by 
increasing the ratio A//z, and the maximal threshold of shifting rate for any (p + A) EA might converge to 0(logn/n). 
When the shifting rate grows to w(log n/n), from one DOP phase to the next phase the global optimum of a BDOP 
will change more than log n of its bits on average, which is too drastic for an EA to track. The rigorous proofs for 
the above conjectures will be left as our future work. 



2 As stated, when A = 1, the (1 + A) EA is equivalent to the (1 + 1) EA 



5.3 Impact of Population on Evolutionary Dynamic Optimization 

We study both the (1 + 1) and (1 + A) EAs in this paper such that the impact of population can be demonstrated. The 
former is an individual-based EA, and the latter can be regarded as a population-based EA adopting the multiple- 
offspring strategy (a concrete way of utilizing population). Our theoretical results clearly demonstrate the positive 
impact of population on the performance of EA in terms of BDOPs with distinct shifting rates. To be specific, in 
the absence of the multiple-offspring strategy, the largest shifting rate of the BDOP class that a (1 + 1) EA can deal 
with efficiently is 0(logn/n 2 ) (Theorems [1] and [3]). After adopting the multiple-offspring strategy, the (1 + A) EA 
can solve efficiently the BitMatchinGd problem with a shifting rate growing to a < l/(5n). In the evolutionary 
computation field, this is the first time that the positive impact is validated in the context of evolutionary dynamic 
optimization. 

5.4 Adaptation of Mutation Rate is not a Panacea 

In this paper, the effectiveness of time-variable mutation rate schemes is investigated on two testbeds, that is, the 
(1 + 1) and (1 + A) EAs. On a BDOP whose global optimum is consistently shifting, one might expect that there is 
some time- variable mutation scheme which can assist the EA to track the optimum by "cleverly" and dynamically 
choosing appropriate mutation rate, such that the mutation rates of an EA can "fit" the stochastic movement of 
the global optimum. However, for both the (1 + 1) and (1 + A) EAs we show that there are classes of BDOPs on 
which various time- variable mutation rate schemes fail to help EAs to perform efficiently. Moreover, our theoretical 
analysis has further been generalized to a concrete instance of the BDOP class called BitMatchinGd . When 
optimizing the BitMatchinGd problem whose shifting rate exceeds the theoretical threshold ©(logn/n 2 ), no time- 
variable mutation rate scheme can assist the (1 + 1) EA to optimize efficiently the problem. When optimizing the 
BitMatchinGd problem whose shifting rate exceeds the theoretical bound ©(logn/n), no time-variable mutation 
rate scheme can assist the (1 + A) EA (A is polynomial in n) to optimize efficiently the problem. Given the fact that 
the static BitMatching problem can be solved by the (1 + 1) EA with 0(n In n) generations [19], and by the (1 + A) 
EA with 0(n\nn) function evaluations given an appropriate A [32], for both EAs the hardness of BitMatchinGd 
mainly comes from the movement of the global optimum. For real-world DOPs with not-too-simple stationary 
objective functions and a moving global optimum, it is highly likely that even a well-designed time- variable mutation 
rate scheme is insufficient to improve the performance of an EA, or even the "promising" time-variable mutation 
rate scheme does not exist. Meanwhile, noting that designing a delicate time- variable scheme could be rather time- 
consuming, it might be better to follow the well-known Occam's razor and use some fixed mutation rate, unless one 
can ensure that the optima of DOPs are not moving too fast. 

6 Conclusion and Future Work 

In this paper, we theoretically study the relationship between time-variable mutation rate scheme and time com- 
plexity of EAs on a class of DOPs. The analytical results are given in terms of the first hitting time of finding the 
moving global optimum. By decomposing the search space and estimating transitions among the resultant subspaces 
(intervals), our analysis shows that, when optimizing a class of DOPs, theoretical limitations do exist for both (1 + 1) 
and (1 + A) EAs with any time- variable mutation rate. Such theoretical results may lead to new understanding of the 
role of mutation in solving DOPs: although some specific time- variable mutation schemes have proven or validated 
to be helpful on some static optimization problems, it might be not be beneficial to seek for some sophisticated 
time- variable mutation rate scheme to improve the performances of EAs on many DOPs with moving global optima. 

It is worth noting that we have not taken the interactions among the adaptations of parameters in different 
operators (e.g., mutation and crossover) or strategies (e.g., population) of EAs into account. It is possible that the 
combinations of different strategies can improve the performances of EAs on DOPs. Nevertheless, it seems likely 
that the EA will still meet some new theoretical limitation when optimizing BDOPs. In the future, we will try to 
carry out such theoretical studies following the methodology utilized in this paper, so as to gain more insight into 
the adaptations of EAs' operators and strengthen the theoretical foundations of EAs. 

A Analytical Tools 

Before providing the proof of Lemma [1] and Theorem [IJ we need to introduce a number of lemmas. First, three 
mathematical tools from previous literatures are presented directly without proofs. 



Lemma 4 (Chernoff bounds |35|). Let a%, a%, . . . , a& € {0, 1} be k independent random variables with the same 
distribution: 

Vi + 3 ■ P(ai = 1) = Pfo = 1), 

where i,j £ {1, . . . , fc}. Let a = $^ i=1 
• V0 < 5 < 1; 

P(a < (1 - <J)E[a]) < e^^/ 2 . (10) 



V(5 < 2e-l: 



• V<5 > 0: 



°(a> (l + <5)E[a]) < e -W 2 /4. (n) 



p(a>(l + J)E[a])<( (i+ e j)W j . (12) 

Chernoff bounds are widely used in theoretical analysis of EAs [TH1 E3 [H] , and play important role in proving 
the theoretical result presented in this paper. Moreover, we present the following lemma proven by Droste |17j . 

Lemma 5 (Droste |17j). Let Wi, ■ ■ ■ , Wi, v%, . . . , Vj £ {0,1} be i+j independent random variables with the same 
distribution, then 

*(£«*>£«*) 

\k=l k=l J J 



Lemma 6 (|22j). Given integers c and d, 

,c + kj\d—k) \c + d 



B Transition Lemmas with Proofs 
B.l Transition Lemmas 

Based on the three basic lemmas, the definitions and notations introduced in previous sections, we will present several 
"transition lemmas", concerning the transition probabilities between different subintervals of [0,n], in responses to 
the DOP change and the mutation operator of EA. The purpose of employing these lemmas is to pack the propositions 
concerning the transitions between different intervals defined in Section [3j so that they can be directly utilized in the 
proofs of Lemmas [T] and [21 Theorem [TJ and[2J As a result, the above proofs can be significantly simplified. Given 
the notations N^ P \ A^ ' and N t defined in Section l4~2l the transition lemmas can be presented as follows: 

Lemma 7. Given any BDOP with a = cj(logn/n) and a £ (0, 1/2], for the t th generation (t £ H) of the (1 + A) EA 
with the mutation rate configuration {P m (n, t, x) £ [0, 1] : x S {1, ■ • • , A}}, 

1. if Nt-\ = i = o(n) and Nt-i — i > n/log 2 n (t > 1), then the probability of — j £ [n — . " , n] is 

log n 

super-polynomially small. 

( P) 

2. if 3 constants ex, e% £ (0,1) such that e 2 < €\ and €2 n ^t—i — * < e i n > then the probability of = j £ 
[n — 5—^ — ,n] is super-polynomially small. 



( P) 

3. if Nt—x = i = n — o(n) (t > 1), then the probability of N{ = j > Nt-x — icr/A is super-polynomially small, 
where ia/4 = cj(logn). 



4- the above three propositions also hold if iV t is replaced by n[ x \ and ia is replaced by i(P m (n,t,x) + & — 
2P m (n,t, x)cr) (in the 3 rd proposition). 

Lemma 8. Given any BDOP with a G (0, 1/2], for the t th generation (if £ NJ of f/ie (1 + A) EA with the mutation 
rate configuration {P m (n, t, x) 6 [0, 1] : x € {lj • • • j A}}, 

(p\ (P) 2 (x) 

1. if N£ = i = o(n), 2V t — i > n/log n and P m (n,t,x) = w(logn/n), then the probability of = j E 
[n — - , n] is super-polynomially small. 

2. if 3 constants £1,62 G (0,1) such that £2 < ei, £2" < -^t — * £ £ i n an <^ Pm(n,t,x) — w (log n/n), t/ien £/ie 
probability of N{ = j E [n — lo ™^ n , n] is super-polynomially small. 

3. if = i = n — o(n) and P m (n,t,x) = w(logn/n), then the probability of — j > — iP m (n,t,x)/4 
in one generation is super-polynomially small, where iP m (n, t,x)/4 = w(logn). 

4- the above three propositions also hold if "t E N" is replaced by "t E N + ", iV t is replaced by Nt—i, and 
iP m (n, t, x) * s replaced by i(P m (n, 1, x) + °" ~ 2P TO (n, t, x) (T ) (i n the 3 rd proposition) . 

Lemma 9. Given the BitMatching_d problem with a G (0, 1/2] , for the t th generation (t E N) of the (1 + A) EA 
with the mutation rate configuration {P m (n, t, x) G [0, 1] : x G {1, . . . , A}}, 

( p) 

1. if P m (n,t,x) — uj(\ogn/n), and 3 constants e% and £2 such that < £2 < £1 < 1? £2^ < ^ = * < £1^, fften 
i/ie probability of N^ x) = j G [0, -p^— ] is super-polynomially small. 

2. if Nf> = i = o(n) and P m (n,t, x) = ^(logrc/n), i/ien i/ie probability of = j < Nf' + (n — i)P m (n, t,x)/4 
is super-polynomially small, where (n — i)P m (n,t,x)/4 — w(logn). 

Lemma 10. Given any BDOP with a — u(logn/n) and a G (0, 1/2], for the t th generation (t G NJ 0/ f/ie (1 + A) 
iM with the mutation rate configuration {P m (n,t,x) G [0, 1 — 1/logn] : x G {1,...,A}} ('Specifically, if the BDOP 
is BitMatching,d , then the condition {P m (n,t,x) G [0,1 - 1/logn] : x G {1,...,A}} can be further relaxed to 
{P m (n,t, X ) G [0,1] :*G {1,...,A}U 



i/ A/t_i = i = n — o(n) and A/t_i = i < n — n/log 3 n, f/ien £/ie probability of = j G [0, ■ 



log n - 

super-polynomially small. 



pi 



( P) 

2. if 3 constants £1, £2 G (0, 1) suc/i that £2 < £1, and £2"- < Nt—i = i < £in, i/ien </ie probability of N K = j E 
[0, - ] is super-polynomially small. 

3. if Nt-i = i = o(n), then the probability of Nj: P ^ = j < Nt-i + (n — i)a/A is super-polynomially small, where 
(n — i)a/4 = w(logn). 

4- the above three propositions also hold if is replaced by N^ x \ and (n—i)a is replaced by (n — i)(P m (n, t, x) + 
(7 — 2P m (n,t, x)o~) (in the 3 rd proposition). 

When A = 1, the above lemmas hold for the (1 + 1) EA. The proofs of the above lemmas are very similar to each 
other. For the sake of brevity, next we only provide the detailed proofs of Lemmas [71 and [TU1 1 . 

B.2 Proof of Lemma [7] 

To prove the transition lemmas, we need the following lemma: 

Lemma 11. Let r(n, t, x) = Pm(ri, t, x)(l — o~) + cr(l — P m (n, t, x)) be the composite bitwise mapping rate for for the 
X th offspring generated at the t th generation (t G N + ). It satisfies that 

Vt G N + : r(n, t, x) G (min jj, max |cr, P m (n, t, x)}|, max |i, P m (n, t, x)} . (13) 

Noting that a G (0, 1/2], we have: 



1. liP m (n,t, X ) e (0,1], 



2. If P m (n,t, X ) 6 (1,1), 



r(n, t, x) = a + (1 - 2a)P m (n, t, X ) < <? + \ (1 - 2a) = ^ , 

r(n,t,x) = P m (n,t,x) + (1 - 2P m (n,t, X ))cr > P m (n,t,x), 
r(n, t, x) = a + (1 - 2<j)P m (n, t, x) > cr; 



r(n,t,x) = Pm(n,t,x) + (1 - 2P m (n,t,x))<J < P m (n,t,x) 
r(n, t, X ) =<?+{!- 2<i)P m (n, t, > a + -(1 - 2a) = -. 



By summarizing the above inequalities, we have 



r{n,t,x) £ ^min|i,max|cr,P m (n,t,x)}|,max|i ! F m (n,t,x)} 



□ 



B.2.1 Lemma01 — Lemma[33 



We prove the three propositions of the lemma respectively. 

Proof of LemmaFAl Noting that cr < 1/2 holds, we estimate the probability that in one generation the EA finds 
number of matching bits iV ( — j 6 [n — n/ log n, n): 

P\nP=j \N t _ 1 =i = o(n),N t _ 1 = i> — G fn - — nl , a = wf^) j 
V log n L log 13 n J \ n I J 

min{i,n— j} 



fc=0 

min{i,n— j} 



£ (AwuK* +2 ' (1 -* , "^' +2 ' ) 



fc=0 



-.i" a WIN "-o(") , . (\\n-o{n) 



2° W - -< 



2/ Super Poly(n)' 



which is a super-polynomially small probability. 

Proof of Lemma^2 The following probability can be estimated similarly as in the first case of the proof of Lemma 
01. For the sake of brevity, we provide the result directly: 



N t =j\ £2" < N t -i = i < ein,j € 



— 5 — ,n 
log 3 n 



,a = u,.^U 



Super Poly (n) ' 



which is a super-polynomially small probability. 

Proof of Lemma^S We will prove this result by applying Chcrnoff bounds to the matching bits and non- matching 
bits respectively. After a DOP change, if the number of flipped matching bits is no smaller than that of the flipped 
non-matching bits, then the number of matching cannot increase after the DOP change. 

According to the condition of Lemma 03, Nt-i — i — n ~ o(n) holds, thus the number of non-matching bits is 
n — i = o(n). Let and be the numbers of flipped non-matching and matching bits after the DOP change at 
the beginning of the t th generation, respectively. According to Chernoff bounds, we have 

(l \ / \ &(ncr) / \ w(logn) 

Dt > y I N t-i =i = n- o(n), a = U (^))<(-^) = -^—) 
4 V n J J \c(n) ) \uj(\ogn) J 

1 

-< 
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where c(n) is a polynomial function of the problem size n that satisfies c(n) = o(no~) and c(n) = uj(1). On the other 
hand, for the number of flipped matching bits, we can also use Chernoff bounds: 

^u, _._ ./ x _ ..f lo S n \\ „ (»-.(«)> „_ w n oe „l , 1 



Super Poly (n) 



Thus, given the condition Nt—i — i — i(n) = n — o{n) (where i(n) is a function of the problem size n), the following 
probability is super-polynomially small: 

D+ + % > Dr I Nt-i =n- o(n),a = a/ 1 "" " 



4 ' 1 * * v " V n 



fc=0 



^ pfc = ft | Nt_i = n - o(n),a = W (^)V + £ > fe | M_! = n - o(n),a = w(^) 
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As a consequence, 

' log n \ \ 1 



A + ~ A" > I tt-x = n - o(n),a = u (^) ) •< 
holds. Combining the above inequality with the fact 



Super Poly (n) 



N ( t P) =N t -i+D+ -D7, 

we have proven Lemma [7l 3. □ 
B.2.2 Lemma [34 

In context of the details redefined in Lemma [7)4, we prove the three propositions of the lemma respectively. In this 
proof, the number of matching bits of the x th offspring individual generated at the t th generation (i.e., A/j ) will be 
considered. 

Proof of Lemma^l We need to consider two different cases: P m (n, t,x) <1 — h and P m (n, t, %) > 1 — h, where 
h £ (0,1) is a constant. 

— First Case: P m (n,t,x) <l — h holds. Given the conditions that a = Lu(\ogn/n) and P m (n,t,x) < 1 — h, by 
applying Lemma [TT] we obtain: 

r(n,t,x) = "(log n/n), 
r(n,t,x) < max|-, 1 - hj. 

By above inequalities, we estimate the probability that in one generation the EA finds number of matching bits 



= j 6 [n - n/ log 3 n, n] : 

P[ iV t (x) =j | Nt-i = i = o(n),N t -i = i> 6 [n- — J,P TO (n,t,x) < 1 - A,<r = of-^) I 

V log n L log n J \ n / I 

min{i,n-j} / . \ / \ 

min{i,n — j} / . \ / . \ / \ 

< r(n,t,x) J ~ i £ ( n " j * ( U = ( n " J J r ( ri '*'X) J_l < ™ n ~ J max | -, 1 - ft,} (by Lemma E} 

. < - , 1 — ft > = 2 w max < - , 1 — h > -< -= , , „ , 
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which is a super-polynomially small probability. 

— Second Case: P m (n,t,x) > 1 — ft holds. Given the condition that rr = o;(log n/n) and P m (n,t,x) > ^ ~ h, by 
applying Lemma [TT] we obtain: 



r{n,t,x) > 



— , max |er, P TO } 1 > min / — , max |er, 1 — ^ > min | — , 1 — /i| . 



On the other hand, we must note the fact called symmetrical bitwise mapping: given the condition that the number 
of matching bits A^_i equals i and the composite bitwise mapping rate r(n, t,x)-, the consequence of the bitwise 
mapping is equivalent to that of the case in which Nt—i equals n — i and the composite bitwise mapping rate equals 
1 - r(n,t,x)- 

Formally, we have 



l-r(n,t,x) < l-min|i,l-/i| = max{i,ft.j. 



Noting the fact described above, we know Vi that satisfies i = o(n) and i > n/ log 2 n, the following equation holds in 
response to the symmetrical bitwise mapping: 

P iV t (x) =j | N t -i=i,i = o(n),i> — ^— ,j G \n - — ^— , n] , r(n, t, x) 



log n 
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Pi N[ x) =j | N*_ 1 = n-i,i = o(n),i > — ,j G n-—^-,n ,r*(n,t, X ) = l-r(n,t, X ) I, 



log n 
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log n 



where we use r*(n, t, x) to represent the notional composite bitwise mapping rate with the value of 1 — r{n, t, x), and 
Nf_i to represent the notional number of matching bits (found by the EA) at the end of the (t — l)*' 1 generation. 

According to the value of r*(n,t, x), there are further two subcases for us to consider. In the hrst situation, 
r*(n, t, x) = 0(\ogn/n) holds. By Chernoff bounds, we know that with an overwhelming probability there are at 
most log 2 n flipped bits among the total n bits: 



P(M#> - A5LJ < W'n | r-(„, t , x) - 0(!fl)) > 1 - (j^) 
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Consequently, with an overwhelming probability the number of matching bits will decrease or increase by at most 
log 2 n after the overall bitwise mapping. It follows from N£_i — n — i, i — o(n) and i > nj log 2 n that the above upper 

bound implies that < n — nj log 2 n + log 2 n < n — nj log 3 n and iV t > nj log 2 n hold with an overwhelming 

probability. In other words, 

P [N^ = j | N t U =n-i,i = o(n),i > G \n - ^-,n] , r* (n, t, x ) - o(^) j 

\ loe n L log 77, J \ n / I 



-< 



log n 

1 



Super Poly (n) 



In the second subcase, r*(n,t,x) — ui(\ogn/n) holds. In the proof of Lemma [7)3, we will consider the case 
r(n,t,x) = w(log n/n). Since there is no essential difference between the proofs related to r*(n,t,x) and r(n,t,x), 
we will not provide the proof here for the sake of brevity. For details, one can refer to the proof of Lemma [7] 3 below. 



Combining the above two subcases together, we obtain that: 
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Thus we have finished the proof of the second case. Combining the first and second cases together, we have proven 
Lemma 01. 

Proof of Lemma^2 For convenience, we omit it the index "(n, t)" in the proof, since the rest part of the proof 
are restricted in the t th generation only. The following probability can be estimated similarly as in the first case of 
the proof of Lemma 01. For the sake of brevity, we provide the result directly: 



. r n l /logrzA 

Nt+i = J I £ 2« < N t = 1 < ein,j £n - - — 3— ,n ,r = uj[ 

L log n - 1 V n / 
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which is a super-polynomially small probability. Thus we have proven Lemma 02. 

Proof of Lemma^S We will prove this result by applying Chernoff bounds to the matching bits and non- matching 
bits respectively. In one generation, if the number of flipped matching bits is no smaller than that of the flipped 
non-matching bits, then the number of matching cannot increase. 

According to the condition of Lemma 03, the number of matching bits at the (t — l) th generation satisfies 
Nt—i = i = n — o(n), thus the number of non-matching bits is n — i = o(n). Let Lf and be the numbers of 
flipped non-matching and matching bits after the DOP change and the mutation at the t th generation, respectively. 
By Chernoff bounds, we have 

G(nr) / \ u(logn) 
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where c(n) is a polynomial function of the problem size n that satisfies c(n) = o(npr) and c(n) = w(l). On the other 
hand, for the number of flipped matching bits, we also apply Chernoff bounds: 

Lt < — | AT t _! =i = n - o(n),r = uj — ^ < e s = e ~ w(losn) -< 



2 V n ) ) Super Poly (n) 

Thus we know that, given the condition Nt = i = i(n) = n — o(n) (where i(n) is a function of the problem size n), 



the following probability is super-polynomially close to 0: 
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Noting that 



iV t (x) =iV t _ 1 + £+-i t - ) 



we have proven Lemma [3 3. □ 
B.3 Proof of Lemma [1011 

Lemma [Tol l has two versions. The first version presents a general result for any BDOP whose shifting rate satisfies 
a = Lu(\ogn/n) and a G (0,1/2], where the time-variable mutation rate should be smaller than 1 — 1/logn. The 
second version is for the BitMatching^ problem whose shifting rate satisfies a — w(logn/n) and a £ (0,1/2], 
where the time- variable mutation rate can take any value between and 1. 

B.3.1 General result for BDOP Class 

(p) 

Let us first study iVj under the conditions Nt-i = n — o(n) and N t -i <n — n . According to Chernoff bounds, 
we know that with an overwhelming probability there are at most 3n/4 flipped bits among the total n bits after the 
DOP change at the beginning of the t th generation: 



\ N P -N t ^\<^n\ae{0,\ 



> 1 - e-"/ 24 >- 1 



Super Poly (n) 



Noting that the range between n/log 2 n and n — o(n) is much larger than 3n/4 (i.e., n — o(n) — n/log 2 n > 3n/4), 
we know that the probability of Nj: P ^ e Li, conditional on N t -i = n — o(n) and Nt—i < n — n/log 3 n, is super- 
polynomially close to 0. Formally, 

W(nP < -\- I Nt-i=n-o(n),N f -i <n--^-) < - i , , v (14) 

V log 2 n log 3 n/ SuperPoly(n) 

Thus the original version of Lemma QJJJl for the general BDOP class is proven. 

Let us further consider the proof when is replaced by N^ 1 in LemmafTUll. It follows from the two conditions 
a = Lo(\ogn/n) and P m (n,t,x) < 1 — 1/logn that r(n,t,x) < 1 — 1/logn. According to Lemma ITT1 we estimate the 
probability that iV t <E Li, conditional on iVt-i 6 A 2 U Ai and r(n,t,x) < 1— 1/logn. Let be the number of 



flipped matching bits after the DOP change and the mutation at the t generation, we have 

N^ x) gLi I N t - 1 = n-o(n),N t - l < n - — ^-,r(n,t,x) < 1 - ; 

log n log n 
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V v ' log n log n log ny 
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On one hand, if r(n, t,x) > then by Chernoff bound we further have 

iV t (x) eLi I N t -i = n-o(n),N t -i < n--^-,— < r{n,t,x) < 1 
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where pi = (JV t _i - 71/ log 2 n)/{N t -i - Wi-i/logn) - 1 = 6(1) (by iV t _i = n - o(n)), and E[Z^~] = E[i^ | tf t _i = 
n — o(n),N t -i < n — n/ log 3 n, l/4e < r(n, t,x) < 1 — 1/logn] = Nt-ir(n, t, x) = Q(n) (by A^_x = n — 0(71) and 
r(n,t,x) > l/4e). 

On the other hand, let us consider the case r(n, t,x) < t-- By Chernoff bound, we have 

7V t (x) GLi I iVt_i = n-o(n),N t -i < n - —^-,r(n,t, X ) < -r- 

log n 4e 



77 77 I 

—5— = (I + P2MW} I JV t _! = n-o(n),iV t _i < n - — g-,r(n,t,x) < 7- 
log n log n 4e 
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where E[I^~] = E[L^ | iVt_i = n - o(n),N t -i < n - 7i/log 3 7i, r(n,t,x) < l/4e] < n/4e and p 2 = (iVt-i - 
71/ log 2 n)/E[X^~] — 1 > (?i/2)/(7i/4e) — 1 > 2e — 1. Thus we have proven Lemmas ITUll. 

B.3.2 Specific result for BitMatching£> 

By Chernoff bounds, we know that with an overwhelming probability there are at most 3ti/4 flipped bits among the 
total 71 bits after the DOP change at the beginning of the t th generation: 

p(|*T - *-,| < 3 r 1 , e (0, i]) > 1 - , 1 - 5 „ per ^ ( „ r (17) 

Consequently, with an overwhelming probability, the number of matching bits will decrease or increase by at most 

3ti/4 after the DOP change. Recall that Nt-i = i = n — 0(71) is one of the conditions of Lemma ITOl 1 . we know that 

Nj: P ^ > 71/ log 2 n holds with an overwhelming probability. Thus we have proven the original version of Lemma ITDl l. 

Meanwhile, let us consider the alternative version of Lemma QUI, where is replaced by N^ x \ Since the EA 

(p) 

always preserves the one with better fitness between the parent and offspring individuals, we know that N t > iV t 
always holds given the BitMatchinGd problem. Combining the above fact with Eq. [171 we obtain the alternative 
version of Lemma [TU] 1 . □ 



C Proofs of Lemmas [T] and [2], and Theorems [4] and [6] 



C.l Lemmas [T] and [2] 

The only difference between the proofs of Lemmas [1] and [2] is that the former utilizes the original version of Lemma 
[TO] for the general BDOP class, while the latter utilizes the specific version of Lemma [TO] for the BitMatchinGd 
problem. Hence, we only provide a unified proof for the sake of brevity. As mentioned in Section 13. 1 1 the proof 
contains the analysis related to Propositions AQ]1, Afl]2, and AQ]3. 
Here we study the above propositions one after another. 

Analysis of Proposition A[TJl. As the first step, we prove that the initial number of matching bits satisfies 
that Nq G Oi holds with an overwhelming probability. Since the initial individual is generated randomly by the 
uniform distribution, we estimate the following two probabilities by Chernoff bounds (Lemma 0J : 



V 4 / Super Poly (n) ' 

In other words, Nq £ [n/4, 3n/4] (where [n/4, 3n/4] C ©i) holds with an overwhelming probability. 

Given the condition iVi £ [n/A, 3n/4], we now prove that the probability of £ Fi ULi is super-polynomially 
close to 0. According to the mutation rate at the th generation P m (n, 0), there are two cases: 

— First case (P m (n, 0) = uj (log n/n)): According to Lemmas[5l2 and[H]l, we know that: 



p(^ 0) eFrULr | £ [J, ,P ro (n, 0) = u (^) 
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Super Poly (n) 



— Second case (P m (n,0) = 0(logn/n)): By Chernoff bounds, we know that with an overwhelming probability 
there are at most log 2 n flipped bits among the total n bits after mutation (which implies that the number of matching 
bits can decrease or increase by at most log 2 n after mutation) : 

iV (O) -NP\ < log 2 n NP £ \^],P m (n,0) = o(^)) 



□ (log 2 n) 



4' 4 
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fi(log n) J Super Poly{n) 

In other words, £ [log 2 n + n/4, log 2 ?i + 3n/4] holds with an overwhelming probability (given the condition 

that N^ P) £ [n/4,3n/4],P m (n,0) = 0(log n/n)). It then follows from AT (O) £ [log 2 n + n/4, log 2 n + 3n/4] £ Fj ULi 
that Aq ' </ Fi U Li holds with an overwhelming probability (given the conditions that £ [n/A, 3n/4] and 

P m (n,0) = 0(logn/n)). 

Combining the above facts for Nq P ^ and Nq , we obtain 

P(N £ ¥ 1 U LO -< l —— - (18) 

Super Poly [n) 

Analysis of Proposition A[l]2. Given the condition N t _i £ Oi (t £ N + ), we now prove that the probability 

of the event ©i ^ Fi U L x is super-polynomially close to 0. The condition A t _i G ©i leads to one of the following 
cases, and Lemmas [71 and [TO1 provide the corresponding probability results: 

Case i [N t _i — o(n) and N t _i > n/log 2 n both hold]: By Lemmas[7Jl and[TO]3, the probability of the event 
Aj P ' ) G Wi U Li, conditional on N t ^i = o(n) and N t _i > nj log 2 n, is super-polynomially close to 0; 

Case ii [3 constants t\ and €2 such that < £2 < £1 < 1 and e^n < Nt-\ = i < £in]: By Lemmas [7]2 
and[TOl2. the probability of the event £ Fi U Li, conditional on the event that 3 constants £1 and £2 such 

that < £2 < £1 < 1 and £2"- < -^Vt— 1 = * < ein, is super-polynomially close to 0; 

Case iii [N t -i = n — o(n) and N t -i < n — nj log 3 n both hold]: By Lemma 03 and [TO] 1, the probability 

(p) 1 
of the event _/V ( G Fi U Li, conditional on JVt_i = n — o(n) and AT ( _i < n — nj log n, is super-polynomially 

close to 0. 



By summarizing the above results, we have: 

< N ^^ Uh ^ N ^ €0 ^ Su P erPoly(ny ^ 

Analysis of Proposition A[ll3. Next we aim at proving that the joint probability of the events N^ G ©i 

and A^ ' £ Fi U Li, conditional on Nt-i £ ©i, is super-polynomially close to 0. The above proposition is a direct 
corollary of the following inequality: 

f(nP g Fi UL x ,iV t (0) £ Fi ULi | N t -i £ oA + p(nP g Qi,iV t (0) £ Fi ULi I N t -i £ oA ~< ~ , . . 

V /V / buperPoly(n) 

Meanwhile, the above inequality is equivalent to the following inequality: 



P^eF.uL.l^eO,),^^. (20) 

Hence, to prove that Proposition A[TJ3, we only need to prove Eq. [20] 

As we know, the condition Nt—i E ©i always implies three potential cases, and Lemma [7] provides the corre- 
sponding probability results: 

Case i [N t -i = o(n) and N t -i > n/ log 2 n both hold] : By Lemmas[7]l and[7l4[l the probability of iV t (0) £ F i; 
conditional on N t -i = o(n) and N t -i > n/log 2 n, is super-polynomially close to 0; By Lemmas ITOl 3 andfTUl4. 
the probability of N^ £ L l5 conditional on A^_j = o(n), is super-polynomially close to 0. 

Case ii [3 constants £i and e-2 such that < £2 < £i < 1 <md £2«. < N t -i = i < e\n\: By Lemmas [32 and 
04, the probability of N^ G Fi, conditional on the event that 3 constants £1 and £2 such that < £2 < £1 < 1 
and < N t -i = i < £in, is super-polynomially close to 0; By Lemmas [T0j2 and [T0]4, the probability of 
the event N^ G Li, conditional on the event that 3 constants £1 and £2 such that < £2 < £1 < 1 and 
£2?i < Nt—i = i < £i n , is super-polynomially close to 0. 

Case iii [N t -i = n — o(n) and Nt—i < n — rt/log 3 n hold] : By Lemmas [7]3 and[7l4, the probability of the 
event iVj G Fi, conditional on N t _i = n — o(n) and Nt-i < n — n/log 3 n, is super-polynomially close to 
0; By Lemmas [TDl 1 and [TU14, the probability of the event E L 1; conditional on N t _i = n — o(n) and 

N t -\ < n — nj log 3 n, is super-polynomially close to 0. 

By summarizing the above results, we can obtain Eq. 1201 Hence we have proven that: 

P( N P g ljN P G Fi U Li I N t -i E ©,) < Super l poly{ny (21) 
Conclusion. Combining the probabilities described in Eqs. [19l and [2T1 together, we have 
r(N t E Fi ULi I Nt-i E ®i) 
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< P(V t (P) £ Fi U Li I JV t _i G ©1) + P(iV t (P) G ©1, N[ a) € Fi U Li I N t -i G ©i) -< 

holds Vt G N + . Let t$ 1 be defined as the first hitting time to the interval Fi, formally, 

t Fi = min > 0; (JV ( (P) G F x ) V (iV t {0) G Fi)}. (22) 

Meanwhile, Vi G N + we have: 

P(r Fl =t)< F(N t G Fi U Li, jVt_! e ©!,..., jV G Oi) 
= F(N t G Fi ULi I N t -i G ©1, . . . , No E Qi)-P(iV t _i G ©i, . . . , Nq G ©i) 
= F(N t G Fi ULi I JV t _i G ©i) -P(JV t _i G Oi,...,iVo G ©i) 

< F(N t G Fx U Li I N t -! E ©1) -< , , . 

SuperPoLyyn) 



3 For the (1 + 1) EA, there is only one offspring individual at each generation, thus iV t = iV t . 



Moreover, Eq. IT51 implies P(rp 1 = 0) -< 1/ Super Poly (n). Consequently, Vgf(n) -< Poly(n), the first hitting time to 
the target interval Fi, which is denoted by r^, satisfies that 



(r Fl < g{n)) = £ P(r Fl = t) < ( 5 (n) + l) max {P(r Fl = t)} ^ 



t<g(n) 1 SuperPoly(n) Super Poly (n) 

In other words, the probability of ti^ -< Poly(n) is super-polynomially close to 0. Consequently, we have proven 
Lemmas [1] and [5J □ 



C . 2 Theorems H and M 

The proof idea of Theorem 2] is quite similar to that of Lemma [TJ After updating the definition of Nj: ^ by 

max{JV t , . . . , N^} in response to the multiple-offspring strategy employed by the (1 + A) EA, the main differ- 
ence between the two proofs is that the transition events with respect to every generation of the (1 + A) EA must 
involve A mutations generating A different offspring individuals, while those of the (1 + 1) EA only consider a single 
mutation leading to a unique offspring individual. Meanwhile, the definition of rp 1 must be updated accordingly 



(t > 0; (V t (P) g Fx) V (jV t W g ¥ x ) V • • • V (n$ X) g F^} , (23) 



where , . . . , are number of matching bits with respect to the A offspring individuals at the t th generation 
respectively. 

The proof of Theorem U can be obtained by replacing the probability propositions related to a unique mutation 
with probability propositions related to A mutations. The probability propositions, demonstrated by Propositions 
A[TJ2, A[T]2, and A[TJ3, are all about probabilities that are super-polynomially close to 0, and similar arguments will 
also hold for the (1 + A) EA, since the offspring size A is polynomial of n and will not increase a super-polynomially 
small probability (i.e., 1 / Super Poly (n)) to a polynomially large probability (i.e., 1/ Poly(n)). 

For the sake of brevity, we do not provide in detail the proof of Theorem |4] here. Instead, to illustrate the above 
proof idea, we only prove a probability proposition for (1 -I- A) EA as an instance. The probability proposition, which 
is an important step towards proving Theorem HI is the same to Proposition AH] 3 except that A^ ' is redefined as 
maxjAj 1 ^, . . . , N^} in response to the multiple-offspring strategy: 

V< g N+ : P(A ( (P) e Oi, A t (0) e Fi ULi | N t ^ g O x ) -< 1/ SuperPoly(n). 
To prove the above proposition, we only need to prove the following inequality: 

p(V t (P) 6 Fx ULi, A ( (0) gFiULi | N t -i e Oi) +p(a 4 (p) g Oi, A t (0) g ¥ 1 UL a | N t -i e ®i) 



1 

-< 



Super Poly (n) 



which is equivalent to the following inequality: 



A^ (0) g Fi U Li | JV t _i g Qi ) -< 



1 



Super Poly (n) 



As we have done in "Analysis of Proposition j4[7J5", the condition Nt-i € Oi can be divided into three potential 
cases, and Lemma [7] provides the corresponding probability results: 

Case i [AVi = o(n) and N t -i > n/log 2 n both hold]: By LemmasEjl and[?l4, the probability of A t (x) g F x 
(Vx £ {1, . . . , A}), conditional on N t -i — o(n) and A t _i > n/ log 2 n, is super-polynomially close to 0. Adding 
up such probabilities with respect to A (A is polynomial in n) different offspring individuals (i.e., different \) 

yields an upper bound for the probability of Aj ' g Fi conditional on Nt-i — o(n) and Nt—i > n/log 2 n, 
which is still super-polynomially close to 0. 

By Lemmas [T0l3 and [T0l4. the probability of g Li, conditional on N t -i — o(n), is super-polynomially 
close to 0. Adding up such probabilities with respect to A (A is polynomial in n) different offspring individuals 
yields an upper bound for the probability of A^ -* £ Li conditional on Nt-i — o(n) and Nt-i > n/log 2 n, 
which is still super-polynomially close to 0. 



Case ii [3 constants t\ and e-2 such that < £2 < £i < 1 and e 2 n < N t -i — i < e\n\: By Lemmas [7J2 and 
04, the probability of G ¥% (Vx G {1, . . . , A}), conditional on the event that 3 constants e\ and e 2 such 
that < e 2 < ei < 1 and e 2 n < Nt—i = i < e\n, is super-polynomially close to 0. Adding up such probabilities 
with respect to A (A is polynomial in n) different offspring individuals yields an upper bound for the probability 
of G Fi conditional on the same event, which is still super-polynomially close to 0. 

By Lemmas flOl 2 and[T0l4. the probability of the event A/j g Li, conditional on the event that 3 constants ei 
and e 2 such that < e 2 < ei < 1 and e 2 n < Nt-i = i < ejn, is super-polynomially close to 0. Adding up such 
probabilities with respect to A (A is polynomial in n) different offspring individuals yields an upper bound for 

the probability of G Li conditional on the same event, which is still super-polynomially close to 0. 

Case iii [Nt-i — n — o(n) and Nt-i < n — n/\og 3 n hold]: By Lemmas [7] 3 and [7j4, the probability of 
the event G Fi (Vx G {1, . . . , A}), conditional on Nt-i — n — o(n) and N t -i < n — n/ log 3 n, is super- 

polynomially close to 0. Adding up such probabilities with respect to A (A is polynomial in n) different offspring 

individuals yields an upper bound for the probability of n[°^ G Fi conditional on the same event, which is still 
super-polynomially close to 0. 

By Lemmas [1(1] 1 and [TO14, the probability of the event iV t G Li, conditional on N t -\ = n — o(n) and 
Nt—i < n — n/\og 3 n, is super-polynomially close to 0. Adding up such probabilities with respect to A (A 
is polynomial in n) different offspring individuals yields the upper bound for the probability of G Li 

conditional on the same event, which is still super-polynomially close to 0. 

By summarizing the above results, we have proven 

Vi G N+ : P(iV t (P) G Oi, 7V t (0) G Fi U Li I N t -i G ©1) -< 1/ 'Super Poly {n). 

□ 

The proof of Theorem [5] is almost the same to that of Theorem 0] The only difference between them is that the 
former utilizes the original version of Lemma [TU] for the general BDOP class, while the latter utilizes the specific 
version of Lemma [10] for the BitMatchinGd problem. For the sake of brevity, we do not provide the details here. 



The main difference between the proofs of Theorems [T] and [2] is that the former utilizes the result of Lemma Q] so 
as to restrict the analysis of the general BDOP class to the case a < Slogn/n, while the latter utilizes Lemma [2] to 
restrict the analysis of BitMatchinGu problem to the case a < 6logn/n, where S is an arbitrary positive constant. 
Here we only provide a unified proof for the sake of brevity. As mentioned in Section 13.11 the proof contains the 
analysis related to Propositions E[T]1, E[T]2, E[T]3, EQ]4, and EQ]5. Here we study the above propositions one after 
another. 

Theorems [1] and [5] are about the (1 + 1) EA which generates a unique offspring individual at each generation. 
For the sake of simplicity, in the proof, P m (n,t, 1), which represents the concrete mutation rate employed by the 
mutation of the parent individual at the t th generation, is written as P m (n,t) for short, where we omit the offspring 
index. Similarly, the offspring index will also be omitted when applying the transition lemmas. 

Analysis of Proposition EQ}1. Since (A 2 U Ai U F 2 U B 2 U Ei U L 2 ) C (Fi U Li), the proof of this sketch is the 
same to "Analysis of Proposition A 1.1" part in the proof of Lemmas [1] and [5] 

Analysis of Proposition E(l].2. Let U\ = U\{n,a) — S^ 1 / 7 \ogn. According to Chernoff bounds, we know that 
with an overwhelming probability there are at most U\ flipped bits among the total n bits after the DOP change 
(which implies that the number of matching bits can decrease or increase by at most U\ after DOP change): 



D Proof of Theorems [T] and [2] 




N t -i\ > Ux | P m {n,t) < 



7 




,<7< 



8 log n 



) 



n 



n 



< 




(24) 



where t G N + is the generation index. On the other hand, concerning the number of matching bits of the offspring 



at the t generation, we obtain another inequality by Chernoff bounds: 

Pf Wr-N^l > U, | P m (n,t) < 2^!^, a < 
\ n n 

- Ui / vw(logn) ^ 
! - r-rr- I = ~^tt -< 



0(7 1/14 )/ W 1 )/ SuperPoly{n) : 

where we utilize the fact that the composite bitwise mapping rate (including both DOP change and mutation) within 
the t th generation, denoted by r(n,t), satisfies that r(n,t) = (1 — <r)P m (n,t) + (1 — P m (n,t))a — P m (n,t) + a — 
2P m (n,t)a < 27 1 / 14 logn/n (since P m (n,t) < 7 1 / 14 logn/n and a < 5\ogn/n < 7 1 / 14 log n/n holds). 

Noting that N t G {A^ (P) , 7V ( (0) }, we obtain the following result by combining the above two inequalities together: 

Pf \N t - N t _ x \ > U X I P m (n,t) < t!^H t<r < '**H 
\ n n 

if 1 / 14 log n <51ogn 
< F| liVr '-JS t -i\ > Ui I P m (n,t) < J- —,<J< 2- 



pfe P) -AT t _!| > U x I P m (n,t) < 



n n 



-PI |ivf > -iV^I > [/, I P m(M ) < t!^L, , < ^ 



-< 



1 



Super Poly {n) 



Consequently, we have 



7 1 / 14 logn <51ogn\ 1 
I-/V* - N t -x\ < Ui I P m (n,t) < - —,<J< — 2- ^1 



Super Poly (n) 



thus we have proven Proposition I^TJ2. 

Analysis of Proposition B[T].3. Let us consider Proposition E[Tj3a first. Proposition EtUl tells us that Nq ^ 
A2 U Ai U F2 U B2 U Bi U L2 holds with an overwhelming probability. To arrive at A2 U Ai U F2 at some generation 
(e.g., the t th generation), the EA has two choices when deciding the mutation rate of the t generation: 

1. Small Mutation Rate (SMR): P m (n,t) < 7 1 / 14 logn/n; 

2. Large Mutation Rate (LMR): P m (n,t) > 7 1 / 14 logn/n. 



Interval 



Decomposition Q 



SMR 



LMR 



L, 




Fi\(A 2 UA] UF 2 ) 



SMR — ► "potential" one-generation SMR transition 

X ► "unlikely" one-generation transition 



Figure 3: Illustration of proof for Proposition E[Tj3. 



Let us investigate the case of reaching A2UA1 UF2 by adopting SMR from Fi \ (A2UA1 UF2), Oi and Li \La. Since 
in Proposition E(TJ2 we have proven that SMR can only provide relatively smaller increment U\ for the number of 



matching bits with an overwhelming probability, we know that the only region that is possibl<|3 to reach A2 U Ai U F2 
by adopting SMR, is a subset of Fi \ (A 2 U Ai U F 2 ). 

According to Proposition Edj2, once SMR is used, the number of matching bits can increase by at most U\ = 
(57 1 / 7 log n with an overwhelming probability. Given the condition that Nt-i 6 Fi \ (A 2 U Ai UF2), the fact G = u>(Ui) 
(with respect to the problem size n) implies that even the maximal one-generation increment U\ is not valid to jump 
over the interval A 2 with an overwhelming probability. Thus, in this case, the EA must reach some intermediate 
point belonging to A 2 first (otherwise the first hitting time to F 2 has already been proven to be super-polynomial, 
which is the final conclusion of the theorem). In other words, we have proven that one way to reach F 2 is to reach 
A 2 first. 

Now we consider the case in which the EA reaches A 2 U Ai U F 2 by adopting LMR. Here two subcases must 
be considered further. In the first subcases, the offspring resulted from the LMR is not preserved by the selection 
operator of the EA, instead, the parent is preserved by the selection operator. This subcase can be viewed as adopting 
SMR with the value of (which will not mutate the parent at all) , and thus can be included in the analysis of the 
last two paragraphs. 

In the second subcases, the offspring resulted from the LMR is preserved by the selection operator of the EA. 
As we have shown in the third sketch in Fig. [3j we want to prove that L 2 is the only region that is possible to 
reach A 2 U Ai U F 2 by one-generation transition with LMR (and thus L 2 is the only region that can reach F 2 in 
one generation adopting LMR). As it is shown in the third sketch in Fig. [3J to prove the above proposition, there 
are three intervals for us to exclude (prove to be "unlikely"): L x \ L 2 ,Oj and Fj \ (A 2 U Ai U F 2 ). According to 
Lemma[Hl ifiVj-i G Oi U (Fi \ (A 2 U Ai U F 2 )) = [nj log 2 n, n - 3G) and P m (n,t) > 7 1/14 log n/n, the probability of 
iVj ^ G A 2 U Ai U F 2 is super-polynomially close to 0. 

In other words, to prove Proposition E[Tj3a, we only need to concern the case in which Nt-i G Li \ L 2 = 
(4G,n/log 2 n] and P m (n,t) > 7 1 / 14 logra/ra hold. Given the above two conditions, below we prove that the probability 
of n[°^ G A 2 U Ai U F 2 = [n — 3G, n] is super-polynomially close to 0. Given an arbitrary constant h G (0, 1), 

— if 7 1 / 14 logn/n < P m (n,t) < 1 — h holds. Given the conditions that a < 5\ogn/n and 7 1 / 14 logn/n < 
P m (n,t) < 1 — h, by applying Lemma [TT1 we obtain: 

r{n,t) = w(log rt/rt), 
r { n , t) < max I — , 1 — 

By above inequalities, we estimate the probability that in one generation the EA finds number of matching bits 

A t (0) = jeA 2 UAi UF 2 : 

Ni° } = j I Nt-i = i G Li \ L 2 , 3 G A 2 U Ai U F 2 , 7 ' logn < P m (n,t) < 1 - h 

n 

mitx{i,n—j} / . \ / A 

= E LlJlkj (iy^y~ l+2k (l-r(n,t)r-«- 4+2fc > 

min{i,n— j} / \ / A / \ 

< r(n,ty~ l ^2 I U ' 1 kj (fc) = ( n ■J r ( n >ty~ t ^ n ™ _J max | — , 1 — /i| (by Lemma [6] in Appendix) 

= n \ log n j max j — i — fo\ —2^1 m ax { - A — h> -< 7-^ , 

1 2 / 1 2 J Super Poly (n) ' 

which is a super-polynomially small probability. 

— if P m (ri, t) > 1 — h holds (i.e., P(P TO (n, t) > 1 — h) = 1). On one hand, given the condition that a — uj(\ogn/n) 
and P m (n,t) > 1 — h, by applying Lemma [TT] we obtain: 

r(n, t) > min / — , max |cr, P m | j- > min / — , max |<j, 1 — h > min |— , 1 — 

On the other hand, we must note the fact called symmetrical bitwise mapping (Fig. [1]): given the condition that 
the number of matching bits Nt-i equals i and the composite bitwise mapping rate r(n,t), the consequence of the 
bitwise mapping is equivalent to that of the case in which Nt-i equals n — i and the composite bitwise mapping rate 



4 Here "possible" means that the event is with at least a polynomially large probability (There exists a positive polynomial function 
of the problem size n, such that the probability is no smaller than the reciprocal of the polynomial function). 



equals 1 — r(n, t). Formally, we have 

1 — r(n, t) < 1 — min | — , 1 ~ ' l j = max 



— n > = max • 

Noting the fact described above, the following equation holds in response to the so-called symmetrical bitwise map- 
ping: 

P^ t (0) = j | N t -i = i, i e Li \ La, j e A 2 U Ai U F 2 , r{n,t)j 
= P^A t (0) = j | ATj*.! =n-i,ie Li \L 2 ,j e A 2 UA 1 UF 2 ,r , (n,<) < 1 -r(n,t)\ 

where we use r*(n,t) to represent the notional composite bitwise mapping rate with the value of 1 — r(n,t), N^_ 1 
represent the notional number of matching bits (found by the EA) at the end of the (t — l) th generation. 
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- X 



X ► " unlikely " one-generation transition 

Figure 4: Sketch of the case P m (n, t) > 1 — h, where (Li \ L 2 )* = (n — n/ log 2 n,n — 4G). 

As shown in Fig. |U we only need to prove that the probability of reaching A 2 U Ai UF 2 is super-polynomially close 
to 0, given the conditions that the notional number of matching bits N£_i belongs to (Li\L 2 )* = (n—n/ log 2 n, n—4G) 
and the notional composite bitwise mapping rate r*(n,t) < max{l/2,/i}. According to the value of r*(n,t) 1 there 
are two situations. 

In the first situation, r*(n,t) = 0(\ogn/n) holds. According to Chernoff bounds, we know that with an over- 
whelming probability there are at most 7 1 / 14 logn flipped bits among the total n bits: 



W f - N U < 7"».og„ | r>, t ) . (!S»)) > 1 - (^) 



7 1 / 14 logn 



>- 1 



Super Poly(n) 



Consequently, with an overwhelming probability, the number of matching bits will decrease or increase by at most 
7 1 / 14 logn after the overall bitwise mapping (including DOP change and mutation). Given the conditions that 
N^_ 1 = n — i and i € Li \ L 2 , the above upper bound implies that n[°^ < n — AG + 7 1 / 14 log?i < n — 3G and 
£ L 2 hold with an overwhelming probability. In other words, 



p| 7V t (0) = j | jVjLj =n-i,ie Lx \L 2 ,j G A 2 U Ai U F 2 , r*(n, t) = o(^^j 



-< 



Super Poly (n) 



In the second situation, r*(n,t) — uj(\ogn/n) holds. This case can be viewed as imposing a mutation with LMR 
r*(n, t) — co(logn/n) to an individual with its number of matching bits belonging to [n — n/ log 2 n, n — AG). According 
to Lemma [3] 3, we have: 



pj nP =j | N^_ x =n-i,i€ Li\L 2 ,j G A 2 U A : UF 2 ,r*(n,i) = ^(^jp) 



-< 



Super Poly (n) 



Since there is no essential difference between the proofs related to r*(n,t) and r(n,t), we will not provide the 
proof here for the sake of brevity. For details, one can refer to the proof of Lemma 03 in the appendix. 



Combining the above two situations of r*(n,t) together, we obtain that: 



P ^iV t (0) = j | N t _i = i,i G Lx \ L 2 , j e A 2 U A, U F 2 , P m (n, t)>l-h,a< Sl °^ n 
P^ t (0) =j | Nt-x =i,i GLi \L 2 ,j e A 2 U Ai UF 2 , r(n,t) > max - /ij^j 
p|iV f (0) = j | AT*^ = n-i,i G L a \L 2 ,j 6 A 2 U Ai UF 2 ,r*(n,() < max J 



(25) 



-< 



Super Poly (n) 



where we utilize P(P m (n, t) > 1 — h, a < 8\ogn/n) = 1 (P m (n, t) > 1 — h and a < S\ogn/n are preconditions of the 
analysis here) to obtain Eq. [25j 

By applying total probability theorem [3B], we further combine the cases 7 1 / 14 log n/n < P m (n,t) < 1 — h and 
P m (n,t) > 1 — h together. As a result, we obtain 

( (n\ , , 7 1 / 14 log n (51ogn\ 1 
P ^ t (0) =j j AT t _! 6L1 \ j 6 A 2 U Ai UF a ,P m n,t > 1 —,o< — < 



Super Poly(n) 



Till now, the cases of SMR and LMR have been analyzed rigorously, and we know that the probability of reaching 
F 2 from (4G, n — 3G) by any one-generation transition is super-polynomially small. Recall that we have also proven 
that the EA must reach A 2 before reaching Ai U F 2 if SMR is used in the one-generation transition that reaches 
A 2 U Ai U F 2 , we have proven Proposition EQ]3a. 

The proof of Proposition E[l]3b is similar to that of Proposition E[l]3a. The major difference between the ideas 
of the two proofs is that, in Proposition EQ]3a the probability of reaching A 2 U Ai UF 2 from L 2 (given the conditional 
that a < 6 log n/n) by one-generation transition is not super-polynomially close to 0, while in Proposition E(Tj3b the 
probability of reaching L 2 from A 2 UAi (given the condition that a < S\ogn/n) by one-generation transition (including 
the shift of optimum of BDOP and the mutation for the solution) is super-polynomially close to 0. The brief proofs 
of the latter proposition, conditional on the general BDOP class (for proving Theorem [T]) and the BitMatchinGd 
problem (for proving Theorem [2]), are presented respectively. 

(For general BDOP class) Noting that the shifting rate a < 5 log n/n, we estimate the following probability by 
Chernoff bound: 

/ (P) , , Slogn\ ( e \ Ul ( e \^ Xo ^ \ 

P |iV t (P) - JV t _i > E7i I a < < Kt? = ( ~777T ) ~< 



7 1 / 7 / \ w (l) / SuperPoly(n) 

Noting that the range between L 2 and A 2 U Ai is much larger than Ui (i.e., n — 7G > Ui), we know that, given 
the condition that a < S log n/n, the probability of reaching L 2 from A 2 U Ai by the parent of the t th generation is 
super-polynomially close to 0. Formally, 



/ATf)GL 2 |AT i _ 1 GA 2 UA 1 ,a<^p) 



Super Poly (n) ^ ^ 



Meanwhile, we estimate the probability that G L 2 , conditional on N t _i G A 2 U Ai and r(n, t) < 1 — 1/ logn, 

where r(n,t) < 1 — 1/logn is derived from a < 5 log n/n and P m (n,t) < 1 — 1/logn (a condition of Theorem [IJ 
by Lemma [TTJ Let be the number of flipped matching bits after the DOP change and the mutation at the t th 



5 Since Lemma [2] has already discuss the case of a = cj(logn/n), we only need to consider the case of a < 51ogn/n here. 



generation, we have 



< 



< 



7V t (0) G L 2 | N t -i G A 2 U Ai, r(n,t) < 1 - 



logn 



iV t (0) < 4G | iVt_i g [n-3G>-G),r(n,f) < 1 



log n 



n - (lt + (n-Nt-x)) < AG I N t -i e [n - 3G,n - G),r(n,t) < 1 - ^— ) 
V / log n J 

Lt > A^t-i -4G | iV t _! e [n-3G,n-G),r(n,t) < 1 



log n 



L t " > n- 7G | N t ^ g [n - 3G,n - G),r(n,t) < 1 



logn 



On one hand, if r(n,i) > then we further have 



1 



7V (0) g L 2 I iV t _i G A 2 U Ai,— < r(n,i) < 1 , 

4e logn 



< P [Lt > (1+Pi)n 1- 



1 



log n 



> (l + pi)E[L t -] | JV t _i G [n-3G,n-G) i < r(n,f) < 1 - r^— 

4e log n 



< P(£ 4 - > (1 + pi)t[Lt \ | N t -x e [n-3G,n-G),^- < r(n,t) < 1 - ^— 



< e 



-i.[L-]p 2 1 /4 



1 



-< 



Super Poly (n) ' 



(27) 



where pi = (n — 7G)/(n — n/ logn) — 1 = 0(1/ log n) (since G = 7 4 / 7 logn and 7 < n/ logn), E[Ljr] = E[L^" | N t -i G 
[n — 3G, n — G), l/4e < r(n, t) < 1 — 1/logn] = 0(n), and the last two inequalities is obtained by Chernoff bound. 
On the other hand, let us consider the case r(n,t) < j-. By Chernoff bound, we have 



N { t 0) g L 2 I N t -i £ A 2 UA 1 ,r(n,t) < 



Ac 



> n - 7G = (1 + p 2 )E[L t "] | iV t _i G [n - 3G, n — G), r(n, t) < 



4e 



< 



I + P2 



(l+p 2 )E[L~ 



1 + P2 



n-7G 



-< 



Super Poly(n) ' 



(28) 



where E[L~] = E[L~ | iV t _i G [n — 3G,n — G),r(n,t) < l/4e] < n/4e and = (n - 7G)/E[L~] - 1 > (n/2)/(n/4e) • 
1 > 2e — 1 (since G = 7 4 ' 7 logn and 7 < n/logn). Combining Eqs. [27l and [28l together, we obtain 



7V t (0) G L 2 I AT t _i G A 2 U Ai, r(n, t) < 1 



logn J Super Poly(n) 



Combining the above inequality with Eq. 1261 we know that the probability of reaching L 2 from A 2 U Ai by one- 
generation transition is super-polynomially close to 0. The rest part of the proof of Proposition E[T]3b is similar to 
that of Proposition E[Tj3a. 

(For BitMatchinGd) According to Chernoff bounds, with an overwhelming probability there are at most U\ 
flipped bits among the total n bits after the DOP change (which implies that the number of matching bits can 
decrease or increase by at most Ui after DOP change): 



P(|JV^-iVi_ 1 |>^ k <^ 



< 



yl/7 



"(logn) 



Super Poly (n) 



Noting that the range between L 2 and A 2 U Ai is much larger than XJ\ (n — 7G > Ui), we know that, given the 
condition that a < Slogn/n, the probability of reaching L 2 from A 2 U Ai by the help of the DOP change at the 
beginning of the t th generation is super-polynomially close to 0. 

Moreover, since the selection operator of the EA always preserves the better individual between the parent 
and offspring (For BitMatchinGd, N t = max{ 7V t (P) , A^ (0) } > 7V f (P) holds according to Eqs. [5] and 0, the above 



inequality also implies that N t ^ L2 holds with an overwhelming probability. In other words, the probability of 
reaching L 2 from A2 U Ai by one-generation transition is super-polynomially close to 0. The rest part of the proof 
of Stcp[5l3b is similar to that of Step[5]3a. 

Analysis of Proposition BQ34. Let U 2 = 7 1 / 7 . According to the condition N t -i £ A 2 U Ai U F2, there are at 
most 3G non-matching bits at the end of the (t — l) th generation. On the other hand, since the number of nipped 
non-matching bits is always no smaller than the final increment of the number of matching bits (after DOP change 
or/and mutation), we know that, to increase the number of matching bits by at least U 2l the number of flipped 
non-matching bits must be larger than U 2 . Hence, concerning the number of matching bits of the parent after the 
DOP change at the t th generation, we obtain 



'(n^ - N t -i > U 2 I N t -i 6 A 2 U Ai U F a , ° < ^p) 



< po)^^- < (Mci°gy = ^*>yny"> (29) 

/ 3(5n 4 / 7 log 2 n \7 1/7 1 

V n J Super Poly(n) ' 

where we obtain Eqs. [29l and [30l by applying Eqs. [9] and [4] respectively. 

Next, concerning the number of matching bits of the offspring at the t th generation, we must consider two different 
cases. The EA adopts SMR at the t th generation in the first case while it adopts LMR at the t th generation in the 
second case. 

For the former case, we estimate the following inequality by noting the condition Nt-i e A 2 U Ai U F 2 : 

Pfivf ) - N t -! > U 2 I N t -! e A 2 U Ai U F 3l P m (n, t) < 2^212, a < 8 **H 
\ n n 
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\ n J Super Poly (n) ' 

where we utilize the fact that the composite bitwise mapping rate r(n,t) satisfies that r(n,t) = (1 — o~)P m (n,t) + 
(1 — P m (n,t))a — P m (n,t) + a — 2P m {n,t)a < 2 r y 1 l li \ogn/n (since P m {n,t) < 7 1 / 14 logn/n and a < Slogn/n < 
7 1 / 14 log n/n holds). 

On the other hand, let us investigate the latter case in which the EA adopts LMR at the t th generation. According 
to the condition N t -i G A 2 U Ai U F 2 , N t -i = n — o(n) holds. By applying Lemma[8]3, we obtain 

f(nP > I Nt-i e A 2 UAx UF 2 ,P m (M) > 2^2*2, o < S J^t) ~< - L—. (32) 

\ n n J Super Poly [n) 

Combining Eqs. |3~T1 and I3"2l together we have: 

p(nP -N t -i > U 2 I N t -i E A 2 UA 1 UF 2 ,a < _< L— , (33) 



Super Poly (n) ' 

Noting that Nt S {iV t , Nj: ^}, we obtain the following result by combining Eqs. [30] with the above inequality 

F^N t ~ N t -i > U 2 I N t -i eA 2 UA 1 UF 2 ,(i< 

< F^N ( t P) -N t -! > U 2 I Nt-i £ A 2 U Ai U F 2 , a < 

+p(V t (0) - N t -! > U 2 I Nt-! e A 2 U Ai U F 2 , a < 
1 

-< 



Super Poly (n) 



Consequently, 



pfiV t -iV t _ 1 <C/ 2 |^ t _ 1 eA 2 UA 1 UF 2 , ( 7<^^) >"1- o 5 ; ? (34) 

\ n ) super Polyyn) 

Hence, we know that once the EA is in A 2 U Ai U F 2 , the number of matching bits can only increase by at most i7 2 
in one generation with an overwhelming probability no matter which mutation rate the EA adopts. 

Analysis of Proposition BQ35. As we have shown in Proposition E[Tj3, so far we have not excluded the 
possibility^ of the following two events: 

1. The EA reaches F 2 via B 2 and then via L 2 by multiple- generation transition (it is possible that the EA reaches 
F 2 from L 2 by large enough LMR, e.g., the mutation rate 1). 

2. The EA reaches F 2 via A 2 (without reaching some intermediate points in L 2 ) by multiple-generation transition. 

Hence, the proof of Proposition E[T]5 contains two parts. First, we need to prove that, if the EA has already reached 
B 2 , then it cannot travel through Bi and reach L 2 with an overwhelming probability. Second, we need to prove that, 
if the EA has already reached A 2 , then it cannot travel through Ai and reach F 2 with an overwhelming probability. 
If the above results have been proven, then we can only hope some events with super-polynomially small probability 
(e.g., the EA reaches F 2 from A 2 by one-generation transition) to happen, which will lead to a super-polynomial first 
hitting time with an overwhelming probability. 

Since the ideas of two parts are quite similar, we only provide the details of the second part (which will lead to 
the final conclusion of the theorem) for the sake of brevity. Assume that we have proven the first part, that is, if the 
EA has already reached B 2 , then it cannot travel through Bi and finally reach L 2 with an overwhelming probability. 
According to Proposition E[I]4, to reach F 2 , the EA must reach L 2 or A 2 first, we know that the only choice to reach 
F 2 is via A 2 . To reach F 2 via A 2 , the EA must travel through Ai. The reason is given by Proposition E[Tj4 and the 
fact that Ai is with the length of G > £/ 2 . 

Next, we will provide the proof of for the aforementioned proposition: if the EA has already reached A 2 , then 
it cannot travel through Ai and reach F 2 with an overwhelming probability, (as we have mentioned, by the same 
technique we can prove the similar result for L 2 ). For Vf G N + , given the conditions that N t G A 2 U Ai and 
Nt—i — i, we let the probabilities of decreasing and increasing the number of matching bits be p~{n,i,t) and 
p + (n,i,t), respectively, i.e., 

{ 5 lo TL 

p-(n,i,t) =F[N t < N t -i | N t -i = i GA2UA1.tr < — — 
V n 

(S 1 71 
N t > N t -i I N t -i = « G A 2 U Ai, a < ° gn 
n 

Next we prove that Vt G N + that satisfies Nt-i = i G A 2 U Ai, the following two inequalities holds: 

p-( n> <,t)>p-(„) = e(T^), (35) 

p + (n,z,t)<pi(n)=G( li/7l ° Sn ), (36) 

where p~ — p^{n) is a general lower bound of p~(n, i,t), and pf = pf(n) is a general upper bound of p~(n, i,t). 

To prove the bound for p~(n,i,t), we need to consider two cases. In the first case, the EA adopts SMR at the 
t th generation; In the second case, the EA adopts LMR at the t th generation. Concerning the first case, we estimate 
the following probability: 

'Y^/^** log n <5 log 71 

N t < A t _! I N t -i = i G A 2 UAi,P TO (n,f) < —,a < 5_ 



> 
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3 Here "excluding the possibility" of an event is referred to proving that the probability of the event is super-polynomially close to 0. 



where we consider the case in which all the non-matching bits are not flipped during the DOP change and mutation 
(this event is with the probability of (1 — c)™~ l (l — P TO (n, t)) n ~ l ) while some of the matching bits are flipped by the 
DOP change and at least one of these flipped matching bits is not flipped again by mutation (this event is with the 
probability of Y^k=i (k) ak ^ ~ <r) l ~ k {l — P m (n,i) fe ). According to the value of a, let us consider two subcases: 

1. If a < — , according to Eq. \§\ we know 7 = an 2 / logn. Since n — i + 1 < 3G + 1 < n/( 7 1 / 7 logn) holds, we have 

T^'^loEJl 1 

N t < N t ^i I N t -i = i £ A 2 UAi, P m {n,t) < 1 —,a< - 

n n 

. /. n „v, \n (~\ & 1 ,\\ n -^+ 1 ^ ■ 7 1/14 logn\ T i/7" log „ ci 7logn 
> (1 - (1 - ay) (1 - a) (1-P m {n,t)) >ci-«oil y >— , 

where c\ is a positive constant. 

2. If i < (T < (5 log n/n, according to Eq. we know 7 = n/ logn. Since 3G < n/ (7 1 / 7 logn) and i> n — 3G > n/2 

hold, we have 

7 1 / 14 logn 1 <5 logn 

JV t < AT t _! I JV*_i = t e A 2 U Ai, P m {n, t) < - — , - < a < 2- 

n n n 
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> c 2 • I 1 — I ' ' 1Ui4 " I 1 - — I 7 8 = c 3 • ' 6 , 
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where c 2 and C3 are positive constants. 
Combining the above two cases together, we know that 

7 1 / 14 logn (Hogn\ r ci "i 7 logn 



7 ' logn dlogn\ r Ci 1 7 logn , , 

N t < N t -i N t -i =i G A 2 UAi,P m (n,t) < -i — ,ct < — > max J J. ;C3 L . 37 

n n J L 2 J n 

holds. 

We now consider the second case in which the EA adopts LMR at the t th generation. We estimate the following 
probability: 

A^ (P) < N t .x I iVt-x = i E A 2 U Ai,a < > £ Q^C 1 - > (l ~ (1 " ^K 1 - 



Moreover, by applying Lemma [5] 3, we have 



M 0) > I e A 2 U A 1( P mM > 2^1^, a < £^ ) ^ 



Super Poly (n) 

Combining the above two inequalities together and noting the fact that the selection operator always preserve the 
better one between the parent and offspring, we have 

T 4 / 14 loEn <5 Ioe n 

N t < N t -! I N t -i = ieA 2 l) Ai, P m (n, t) > 1 2-, a < 2_ 

n n 
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According to the value of <r, let us further consider two subcases: 
1- If a < hi according to Eq. [9l we know 7 = an 2 / log n. We have 

-,1/14 \Qrr n 2. 

N t < N t -i I N-i = i G A 2 UAi, P m (n,t) > - — ,a < - 

n n 

> \-{l-{l-af){l-ar- i >\.{l-{l-af){l-aT>c i .ia>^. 1 -^, 

where C4 is a positive constant. 

2. If — < a < 5\ogn/n, according to Eq. [SJ we know 7 = n/logn. Since 3G < n/(£logn) and i > n — 3G > n/2 
hold, we have 

jyl/ 44, Joe 71 1 (5 log n 
7V f < JV t _! I N t -! =1 e A 2 UAi,P m (n,t) > — < a < — 

71 71 71 

> 1.(1- (1-1)^(1-^ >C5 . (1 _ (T) 3 G>CS .( 1 _^)^ =C6 .I^ ) 

where C5 and C6 are positive constants. 
Combining the above two cases together, we know that 

Z 1 / 14 log 77 (5 log 71 \ r C4 1 7 log 71 
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JV t < JV t _i N t -i =% E A 2 UAi,P m (n,t) > — ,ct < — > max -,c 6 

71 71 / I 2 J 



holds. Combining Eq. [37] with the above inequality, we have proven that 

' 7 log 71 N 



P (n,i,t) > Pi (71) = o( 7 ° gn )- 



Now let us prove the upper bound of p + (n, i, t). To increase the number of matching bits by DOP change and/or 
mutation, the number of flipped non-matching bits must be larger than the number of flipped matching bits after 
DOP change and/or mutation. According to Lemma [SJ concerning the relation between N^ and iVj—i, we have 

(P) AT , AT . _ A , , A 6logn\ ^ 3G 3G 67 4 / 7 logn 



Nr > N„ I AT t _i = ie A 2 U A x> a < j < _ _ < _ < (38) 

For the offspring at the t th generation, we still need to consider two cases. In the first case, the EA adopts SMR 
at the t th generation; In the second case, the EA adopts LMR at the t th generation. 

We now consider the first case in which the EA adopts SMR. By applying Lemma [5] we have 

* T (o) 1 »r ■ i-. / .\ 7 1 / 14 log7i <51og7i\ 3G 3G 67 4 / 7 log7i . 
N^ u > > N t -! N t -! = 1 e A 2 U Ai, P m (n, t) < 1 2-, a < ^_ < — < — < ' B (3 9) 

71 71 / 71 — 3G 71/2 71 

where we utilize the fact that the overall impact of DOP change and mutation to the offspring individual at the 
t th generation can be represented by a overall mapping with bitwise mapping rate r(n,t) = (1 — a)P m (n,t) + (1 — 
P m (n,t))a = P m (n,t)+a-2P m (n,t)a. 

Noting that Nt G {iVj , N^}, we obtain the following result by combining Eqs. [551 and I3"9l together: 

T 4 / 14 log 71 5 log 71 

N t > N t -! I Nt-i = i e A 2 UAi,P m (n,i < - —,a < — 

71 71 

( PI 5 log 71 
< P W n > N t -! N t -! = i G A 2 U A x , a < — 

71 

[ Ni 0) > N-! I N-! = i G A 2 UA!,P m (n,t) < 2^212 ff < *}W 



< ^' 7 ^\ (40) 



On the other hand, we consider the case in which LMR is adopted by the EA. By applying Lemma [3]3, we know 

1 



N\° ] > jV t _! | N t - X £ A 2 UAiUF 2 ,P m (n,t) > -£ < — 



-< 



Super Poly (n) 



In other words, if the LMR is used, the number of matching bits found by the offspring at the t th generation will not 
be larger than Nt—\- Similar to Eq. HHl we obtain 
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Combining Eq. [40] with the above inequality, we obtain 

N t > N t -i | N t -i = i e A 2 U A lf <7 < 



<51ogn\ 127 4 / 7 logn 



< 



In other words, we have proven Eq. 1361 

Droste utilized the idea of "effective mutation" to prove Theorem 2 of [T7]- Intuitively speaking, this technique 
estimates the lower bound for the number of effective mutation (which can be interpreted as the number of generations 
in which the number of matching bits changes) for reaching the target, and it also estimates the upper bound for 
effective mutation that can be provided by the EA. If the former one is significantly smaller than the later one, then 
the EA cannot reach the target. Following this intuitive idea, we then provide the formal proof of the final conclusion 
of the theorem. By Eqs. 1551 and 1551 we obtain the upper bound of the probability of Nt > Nt-i, conditional on the 
event that N t ^ N t -i. 
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N t > N t -i | N t i= N t -i = i G A 2 U Ai,o- < — — 

n 

p + (n,i,t) ^ pf 0(7 4 / 7 logn/n) 
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p+(n,i,t)+p [n,i,t) pf+Pi 9( 7 4 / 7 logn/n) + 9( 7 logn/n) 



e( 7 - 3/7 ), 



(41) 



where Q is referred to the asymptotic order of the problem size n. 

We now prove that the EA will spend super-polynomial number of generations to travel through Ai and reach 
F 2 . Let T be defined formally as follows: 



T = 



| N t -i + N t , N t -! G A 2 U Ai, a < 



5 log 



(42) 



Recall that in Proposition EQ]3, we have proven that the EA has to travel through the whole Ai with length of G; In 
Proposition Bj]4, we have proven that in one generation the number of matching bits cannot increase by more than 
Ui with an overwhelming probability. According to Propositions B[Tj3 and E[T]4, to travel through Ai and reach F 2 , 
T is lower bounded by G/U2 — 7 3 ^ 7 logn with an overwhelming probability. Formally we have 



T > 7 3/7 logn) >- 1 - 



Super Poly (n) 



Further, let T + and T be defined as follows: 



T 



{teN+ I Nt-i < N,Nt- 



G A 2 U Ai , a < 



Slogn 
n 

(Hoe 



\t G N+ I Nt-i > N t ,N t -i G A 2 UAi,ff < — ^} 



(43) 

(44) 
(45) 



and the above definitions imply that T = T + + T 



Moreover, by Proposition EQ]4, given the condition that the EA is in A2 U Ai, in one generation the number of 
matching bits cannot increase by more than U2 with an overwhelming probability. Hence, among the T generations, 
the number of matching bits can increase by T + U2 — T~ at most with an overwhelming probability. To travel through 
Ai, the following inequality must hold (a necessary condition): 

T+U 2 -T~ >G. 

Noting that T = T+ + T~, it follows that 

T+> G±T. 
~ U 2 + l 

Recall the definitions of G, T and U2, we have 

G + T , T 
> d ■ 



U 2 + 1 7 V7 

where d is a positive constant. Combining the above two inequalities together, we know that the following condition 
must be satisfied so as to travel through Ai: 

T + >d--^. (46) 

Next, we only need to prove that the above condition cannot be satisfied with an overwhelming probability (thus the 
EA cannot travel through Ai with an overwhelming probability). It follows from Eqs. |4"T1 and W2\ that 



E[T+ \T] = 



T 

7 3/7 



where O is referred to the asymptotic order of the problem size n. By Chernoff bounds, we estimate the probability 
of Eq. |H 



7 V7 1 j y dT J V7 2/7 

where d is a positive constant. Meanwhile, Eq. |43] implies that 

f(t> 7 3 / 7 logn) y 1 
By the total probability theorem [3^, we obtain 

T+ > d ■ ~< 



Super Poly (n) 



7 1 / 7 / Super Poly (n) 

In other words, the condition T + U2 — T~ > G does not hold with an overwhelming probability, which implies that 
the EA cannot travel through Ai and reach F2 with an overwhelming probability, given the condition that it has 
already reached A2. Consequently, we have proven that the EA cannot reach F2 by a polynomial first hitting time 
with an overwhelming probability. Formally, let tf 2 be defined as follows 

r Fl = min it > 0; (A^ t (P) G Fi) V (A^ t (0) G Fi)}. (47) 

We have proven that 

P(r F2 ~< Poly(nj) ~< ^y^, 

SuperPoly(n) 

which leads to Theorems Q] and [2] according to the definition of F2 in Definition [TO] □ 
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